{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":494232964,"defaultBranch":"main","name":"flash-attention","ownerLogin":"Dao-AILab","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-05-19T21:22:06.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/139507659?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1726526449.0","currentOid":""},"activityList":{"items":[{"before":"d7ca6437954990fb3f38523abb2d9548cdff2288","after":"7876a02ee079e564845f5042e394c680cc7b39c0","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-21T01:30:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ganeshcolfax","name":null,"path":"/ganeshcolfax","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/141785359?s=80&v=4"},"commit":{"message":"add gqa decoding logic.","shortMessageHtmlLink":"add gqa decoding logic."}},{"before":"d9bd088c2cc8ff64d71a98c6c036087737a2e68c","after":"d7ca6437954990fb3f38523abb2d9548cdff2288","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-21T01:03:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"initialize semaphore when num splits != 1","shortMessageHtmlLink":"initialize semaphore when num splits != 1"}},{"before":"a5db3c137ac4b2c6f1c65ee1195aad4d07e613c8","after":"d9bd088c2cc8ff64d71a98c6c036087737a2e68c","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-21T00:11:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ganeshcolfax","name":null,"path":"/ganeshcolfax","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/141785359?s=80&v=4"},"commit":{"message":"adding block_n and block_m for different headdim.","shortMessageHtmlLink":"adding block_n and block_m for different headdim."}},{"before":"267628ff70d7937f5529ae2e757ed2959d8e8501","after":"a5db3c137ac4b2c6f1c65ee1195aad4d07e613c8","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-20T21:15:05.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"ganeshcolfax","name":null,"path":"/ganeshcolfax","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/141785359?s=80&v=4"},"commit":{"message":"add num_split_heuristics.","shortMessageHtmlLink":"add num_split_heuristics."}},{"before":"8476986721ca38a0f56766d925fad12ae0ae9358","after":"53a4f341634fcbc96bb999a3c804c192ea14f2ea","ref":"refs/heads/main","pushedAt":"2024-09-20T19:45:25.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tridao","name":"Tri Dao","path":"/tridao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5616128?s=80&v=4"},"commit":{"message":"Hotfix due to change of upstream api (#1239)","shortMessageHtmlLink":"Hotfix due to change of upstream api (#1239)"}},{"before":"9cafd4ae140b52dc2c95be1a1c6aeb24925a883d","after":"8476986721ca38a0f56766d925fad12ae0ae9358","ref":"refs/heads/main","pushedAt":"2024-09-20T19:44:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tridao","name":"Tri Dao","path":"/tridao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5616128?s=80&v=4"},"commit":{"message":"Fix FAv3 compilation with MSVC (#1240)","shortMessageHtmlLink":"Fix FAv3 compilation with MSVC (#1240)"}},{"before":"9dd67426350dc85ab211980528de1959d2d8b7b5","after":"267628ff70d7937f5529ae2e757ed2959d8e8501","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-20T06:33:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"remove unused code","shortMessageHtmlLink":"remove unused code"}},{"before":"0375badecdb4e4b1ba354d9a68dee482a8126ceb","after":"9dd67426350dc85ab211980528de1959d2d8b7b5","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-20T06:22:59.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"Merge branch 'fa3-kvcache-gqa' of github.com:Dao-AILab/flash-attention into fa3-kvcache-gqa","shortMessageHtmlLink":"Merge branch 'fa3-kvcache-gqa' of github.com:Dao-AILab/flash-attentio…"}},{"before":"30e1ef0f79418af5ad52987e49d691f0d4519c46","after":"9cafd4ae140b52dc2c95be1a1c6aeb24925a883d","ref":"refs/heads/main","pushedAt":"2024-09-20T06:14:45.000Z","pushType":"pr_merge","commitsCount":5,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"Merge pull request #1233 from Dao-AILab/ipiszy/local_attn\n\nAdd local attention in Hopper FAv3","shortMessageHtmlLink":"Merge pull request #1233 from Dao-AILab/ipiszy/local_attn"}},{"before":"ccd561b6a0861bb709f06807fcf4e0cb68e50274","after":"1c9717d699c720ce62b662b376ce224988609fbd","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-20T05:51:04.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"address comments","shortMessageHtmlLink":"address comments"}},{"before":"af5af2072d3b6e15617bdd2d12af40d0fa1932ac","after":"ccd561b6a0861bb709f06807fcf4e0cb68e50274","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-20T05:46:33.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"address comments","shortMessageHtmlLink":"address comments"}},{"before":"be6c1b98c40785fa39d73954f5e84a7e1657d261","after":"af5af2072d3b6e15617bdd2d12af40d0fa1932ac","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-20T05:07:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"address comments","shortMessageHtmlLink":"address comments"}},{"before":"f07dcdd2ac64d8dbe136955c37603c2b4835f9d3","after":"0375badecdb4e4b1ba354d9a68dee482a8126ceb","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-19T23:21:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ganeshcolfax","name":null,"path":"/ganeshcolfax","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/141785359?s=80&v=4"},"commit":{"message":"changes for correct lse write out for splits=1 and splits > 1 case.","shortMessageHtmlLink":"changes for correct lse write out for splits=1 and splits > 1 case."}},{"before":"020ecf8907dbbeb04d6586fbdb50da972b38d89d","after":"f07dcdd2ac64d8dbe136955c37603c2b4835f9d3","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-19T21:19:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"revert OutputType change","shortMessageHtmlLink":"revert OutputType change"}},{"before":"986247a7ed574467772eb574b5f8e09d043c2c36","after":"020ecf8907dbbeb04d6586fbdb50da972b38d89d","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-19T18:25:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"fix wrong tile size for hdim 64","shortMessageHtmlLink":"fix wrong tile size for hdim 64"}},{"before":"c6b1c1f23b7c298ba9db2fb2431fdc7cb3de924a","after":"986247a7ed574467772eb574b5f8e09d043c2c36","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-19T16:45:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"change Element to OutputType for template param in combine kernel. Only matters for fp8 support","shortMessageHtmlLink":"change Element to OutputType for template param in combine kernel. On…"}},{"before":"18cbd9cea0722dd252d8679468de0ac00eea86ed","after":"c6b1c1f23b7c298ba9db2fb2431fdc7cb3de924a","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-19T16:38:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"avoid redundant compilation with combine kernel by only including needed template params","shortMessageHtmlLink":"avoid redundant compilation with combine kernel by only including nee…"}},{"before":"83e41b3ca497aa9e5df94eb9faf2593387c87496","after":"30e1ef0f79418af5ad52987e49d691f0d4519c46","ref":"refs/heads/main","pushedAt":"2024-09-18T07:32:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tridao","name":"Tri Dao","path":"/tridao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5616128?s=80&v=4"},"commit":{"message":"minify torch.torch.int32 to torch.int32 (#1237)","shortMessageHtmlLink":"minify torch.torch.int32 to torch.int32 (#1237)"}},{"before":"af314d400663fe895199b0586a9f1f718b1d7b79","after":"83e41b3ca497aa9e5df94eb9faf2593387c87496","ref":"refs/heads/main","pushedAt":"2024-09-18T02:49:26.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tridao","name":"Tri Dao","path":"/tridao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5616128?s=80&v=4"},"commit":{"message":"Add custom ops for compatibility with PT Compile (#1139)\n\n* Add custom ops for compatibility with PT Compile\r\n\r\n* Add support for varlen functions too\r\n\r\n* Add version checks for pytorch API\r\n\r\n* Fix PT compile interfaces so it works e2e\r\n\r\n* Make sure PT < 2.4 runs fine\r\n\r\n* Fix python mistake\r\n\r\n* Fix all the autograd magic issues\r\n\r\n* typo on head_dim\r\n\r\n* Fix deterministic test failures, remove unneeded detaches()\r\n\r\n* remove test requires_grad\r\n\r\n* Resolve all the pytorch versioning issues\r\n\r\n* C++ and python refactor to improve padding management for torch.compile()\r\n\r\n* Add improvements suggested by @anijain2305","shortMessageHtmlLink":"Add custom ops for compatibility with PT Compile (#1139)"}},{"before":"1c38e5bbc8f593c88a2b0e5ae8679068dfecfcbe","after":"18cbd9cea0722dd252d8679468de0ac00eea86ed","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-18T01:49:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"change flash api for rebase","shortMessageHtmlLink":"change flash api for rebase"}},{"before":"2c105a02aaa0579eef5aebcd7f80e129d6be63ce","after":"1c38e5bbc8f593c88a2b0e5ae8679068dfecfcbe","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-18T01:46:28.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"add hid=64.","shortMessageHtmlLink":"add hid=64."}},{"before":"820366a6f5baf0fb7e7d1446350e8f467efcc0b5","after":"2c105a02aaa0579eef5aebcd7f80e129d6be63ce","ref":"refs/heads/fa3-kvcache-gqa","pushedAt":"2024-09-17T02:00:40.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"jayhshah","name":null,"path":"/jayhshah","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/17012019?s=80&v=4"},"commit":{"message":"Merge branch 'fa3-kvcache-gqa' of github.com:Dao-AILab/flash-attention into fa3-kvcache-gqa","shortMessageHtmlLink":"Merge branch 'fa3-kvcache-gqa' of github.com:Dao-AILab/flash-attentio…"}},{"before":"701a5564f4a990b7378b514189eda8fa7359dbdc","after":"be6c1b98c40785fa39d73954f5e84a7e1657d261","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T23:13:39.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"small fixes","shortMessageHtmlLink":"small fixes"}},{"before":"dff976a84aabdf191761d7458d2c7aae68b3afb6","after":"701a5564f4a990b7378b514189eda8fa7359dbdc","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T22:51:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"small fixes","shortMessageHtmlLink":"small fixes"}},{"before":"ca42d691dd159b10974daf3ea19a945c64010ebc","after":"dff976a84aabdf191761d7458d2c7aae68b3afb6","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T22:44:57.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"fixes","shortMessageHtmlLink":"fixes"}},{"before":"7c3b11105d0ebd67e235c397e14d242604c8550f","after":"ca42d691dd159b10974daf3ea19a945c64010ebc","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T22:42:26.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"fixes","shortMessageHtmlLink":"fixes"}},{"before":null,"after":"7c3b11105d0ebd67e235c397e14d242604c8550f","ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T22:40:49.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"fixes","shortMessageHtmlLink":"fixes"}},{"before":"7c3b11105d0ebd67e235c397e14d242604c8550f","after":null,"ref":"refs/heads/ipiszy/local_attn","pushedAt":"2024-09-16T22:38:59.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"}},{"before":"e2182cc21d5be2a1d71c8ca7eb1bc425563041f1","after":"af314d400663fe895199b0586a9f1f718b1d7b79","ref":"refs/heads/main","pushedAt":"2024-09-16T21:57:19.000Z","pushType":"pr_merge","commitsCount":4,"pusher":{"login":"ipiszy","name":"Ying Zhang","path":"/ipiszy","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10527447?s=80&v=4"},"commit":{"message":"Merge pull request #1182 from ipiszy/used_q\n\nAdd seqused_q in fwd / bwd and seqused_k in bwd in hopper FA.","shortMessageHtmlLink":"Merge pull request #1182 from ipiszy/used_q"}},{"before":"cc1690d9d6397afb2b2844b39a189c8e2374903f","after":"e2182cc21d5be2a1d71c8ca7eb1bc425563041f1","ref":"refs/heads/main","pushedAt":"2024-09-16T06:17:28.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"tridao","name":"Tri Dao","path":"/tridao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5616128?s=80&v=4"},"commit":{"message":"Support page kvcache in AMD ROCm (#1198)\n\n* Integrate ck branch of ck_tile/fa_bwd_opt\r\n\r\n* Assume dq and q share the same stride\r\n\r\n* update ck\r\n\r\n* Integrate more stride of dq_acc\r\n\r\n* Revert fwd dropout\r\n\r\n* Fix paremeter order\r\n\r\n* Integrate ck with more stride\r\n\r\n* update the limit of hdim of bwd\r\n\r\n* Check argument\r\n\r\n* Add test_flash_attn_causal\r\n\r\n* Support unpad lse\r\n\r\n* Add test_flash_attn_varlen_causal, test_flash_attn_race_condition, test_flash_attn_bwd_overflow, test_flash_attn_bwd_transpose, test_flash_attn_bwd_varlen_overflow, test_flash_attn_deterministic, test_flash_attn_varlen_deterministic\r\n\r\n* Fix stride and Kn0\r\n\r\n* Fix CK sync issue\r\n\r\n* Fix typo\r\n\r\n* Update CK for changing of fmha_fwd_args\r\n\r\n* Add kvcache tmp\r\n\r\n* Add kvcache\r\n\r\n* Fix comment\r\n\r\n* Sync behavior with ck\r\n\r\n* Update CK to develop\r\n\r\n* remove large test case\r\n\r\n* Add kvcache test\r\n\r\n* Fix page_block_size in arg\r\n\r\n* Minor fix\r\n\r\n* Fix stride error\r\n\r\n* Update seqlen of kvcache before splitkv\r\n\r\n* Fix compile error\r\n\r\n* Fix bug of hdim is not 8x\r\n\r\n* Fit ck arg\r\n\r\n* support adaptive num_splits\r\n\r\n* add more tests\r\n\r\n* Refine test tolerance\r\n\r\n* update CK\r\n\r\n* Move override_num_splits_if_necessary into cpp\r\n\r\n* update ck\r\n\r\n* Update ck\r\n\r\n* Support different flag for different version of hip\r\n\r\n* remove coerce-illegal, becasue this is not required in FA\r\n\r\n* Update ck to fix xcratch memory\r\n\r\n* Add coerce-illegal in some version\r\n\r\n* Add compile flag for rtn rounding\r\n\r\n* remove redundant init\r\n\r\n* Using env var to switch rounding mode\r\n\r\n* update ck","shortMessageHtmlLink":"Support page kvcache in AMD ROCm (#1198)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMVQwMTozMDo0My4wMDAwMDBazwAAAAS8ewPi","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMVQwMTozMDo0My4wMDAwMDBazwAAAAS8ewPi","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xNlQwNjoxNzoyOC4wMDAwMDBazwAAAAS3FUpD"}},"title":"Activity · Dao-AILab/flash-attention"}