Releases · hpcaitech/ColossalAI

19 Sep 02:53

github-actions

v0.4.4

dabc2e7

Version v0.4.4 Release Today! Latest

Latest

What's Changed

Release

[release] update version (#6062) by Hongxin Liu

Colossaleval

[ColossalEval] support for vllm (#6056) by Camille Zhong

Moe

[moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw

Sp

Merge pull request #6064 from wangbluo/fix_attn by Wang Binluo
Merge pull request #6061 from wangbluo/sp_fix by Wang Binluo

Doc

[doc] FP8 training and communication document (#6050) by Guangyao Zhang
[doc] update sp doc (#6055) by flybird11111

Fp8

[fp8] Disable all_gather intranode. Disable Redundant all_gather fp8 (#6059) by Guangyao Zhang
[fp8] fix missing fp8_comm flag in mixtral (#6057) by botbw
[fp8] hotfix backward hook (#6053) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Hotfix

[hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw

Feature

[Feature] Split cross-entropy computation in SP (#5959) by Wenxuan Tan

Full Changelog: v0.4.4...v0.4.3

Assets 2

10 Sep 02:39

github-actions

v0.4.3

b3db105

Version v0.4.3 Release Today!

What's Changed

Release

[release] update version (#6041) by Hongxin Liu

Fp8

[fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
[fp8] fix linear hook (#6046) by Hongxin Liu
[fp8] optimize all-gather (#6043) by Hongxin Liu
[FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
Merge pull request #6033 from wangbluo/fix by Wang Binluo
Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
[fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
[fp8] zero support fp8 linear. (#6006) by flybird11111
[fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
[fp8]update reduce-scatter test (#6002) by flybird11111
[fp8] linear perf enhancement by botbw
[fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
[fp8] support asynchronous FP8 communication (#5997) by flybird11111
[fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
[fp8] support hybrid parallel plugin (#5982) by Wang Binluo
[fp8]Moe support fp8 communication (#5977) by flybird11111
[fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
[fp8] support gemini plugin (#5978) by Hongxin Liu
[fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
[fp8] add fp8 linear (#5967) by Hongxin Liu
[fp8]support all2all fp8 (#5953) by flybird11111
[FP8] rebase main (#5963) by flybird11111
Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
[fp8] add fp8 comm for low level zero by ver217

Hotfix

[Hotfix] Remove deprecated install (#6042) by Tong Li
[Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
[Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
[Hotfix] README link (#5966) by Tong Li
[hotfix] Remove unused plan section (#5957) by Tong Li

Colossalai/checkpoint_io/...

[colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan

Colossal-llama

[Colossal-LLaMA] Refactor latest APIs (#6030) by Tong Li

Plugin

[plugin] hotfix zero plugin (#6036) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
[plugin] add cast inputs option for zero (#6003) by Hongxin Liu

Ci

[CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Colossalchat

[ColossalChat] Add PP support (#6001) by Tong Li

Misc

[misc] Use dist logger in plugins (#6011) by Edenzzzz
[misc] update compatibility (#6008) by Hongxin Liu
[misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
[misc] remove useless condition by haze188
[misc] fix ci failure: change default value to false in moe plugin by haze188
[misc] remove incompatible test config by haze188
[misc] remove debug/print code by haze188
[misc] skip redunant test by haze188
[misc] solve booster hang by rename the variable by haze188

Feature

[Feature] Zigzag Ring attention (#5905) by Edenzzzz
[Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
[Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
[Feature] MoE Ulysses Support (#5918) by Haze188

Chat

[Chat] fix readme (#5989) by YeAnbang
Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
[Chat] Fix lora (#5946) by YeAnbang

Test ci

[test ci]Feature/fp8 comm (#5981) by flybird11111

Docs

[Docs] clarify launch port by Edenzzzz

Test

[test] add zero fp8 test case by ver217
[test] add check by hxwang
[test] fix test: test_zero1_2 by hxwang
[test] add mixtral modelling test by botbw
[test] pass mixtral shardformer test by botbw
[test] mixtra pp shard test by hxwang
[test] add mixtral transformer test by hxwang
[test] add mixtral for sequence classification by hxwang

Lora

[lora] lora support hybrid parallel plugin (#5956) by Wang Binluo

Feat

[feat] Dist Loader for Eval (#5950) by Tong Li

Chore

[chore] remove redundant test case, print string & reduce test tokens by botbw
[chore] docstring by hxwang
[chore] change moe_pg_mesh to private by hxwang
[chore] solve moe ckpt test failure and some other arg pass failure by hxwang
[chore] minor fix after rebase by hxwang
[chore] minor fix by hxwang
[chore] arg pass & remove drop token by hxwang
[chore] trivial fix by botbw
[chore] manually revert unintended commit by botbw
[chore] handle non member group by hxwang

Moe

[moe] solve dp axis issue by botbw
[moe] remove force_overlap_comm flag and add warning instead by hxwang
Revert "[moe] implement submesh initialization" by hxwang
[moe] refactor mesh assignment by hxwang
[moe] deepseek moe sp support by haze188
[moe] remove ops by hxwang
[moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
[moe] finalize test (no pp) by hxwang
[moe] init moe plugin comm setting with sp by hxwang
[moe] clean legacy code by hxwang
[moe] test deepseek by hxwang
[moe] implement tp by botbw
[moe] add mixtral dp grad scaling when not all experts are activated by botbw...

Assets 2

31 Jul 02:06

github-actions

v0.4.2

09c5f72

Version v0.4.2 Release Today!

What's Changed

Release

[release] update version (#5952) by Hongxin Liu

Zero

[zero] hotfix update master params (#5951) by Hongxin Liu

Feat

[Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu

Shardformer

[shardformer] hotfix attn mask (#5947) by Hongxin Liu
[shardformer] hotfix attn mask (#5945) by Hongxin Liu

Chat

Merge pull request #5922 from hpcaitech/kto by YeAnbang

Feature

[Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua

Hotfix

[Hotfix] Fix ZeRO typo #5936 by Edenzzzz

Fix bug

[FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
[FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua

Colossalchat

[ColossalChat] Hotfix for ColossalChat (#5910) by Tong Li

Examples

[Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz

Plugin

[plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu

Full Changelog: v0.4.2...v0.4.1

Assets 2

17 Jul 09:30

github-actions

v0.4.1

73494de

Version v0.4.1 Release Today!

What's Changed

Release

[release] update version (#5912) by Hongxin Liu

Misc

[misc] support torch2.3 (#5893) by Hongxin Liu

Compatibility

[compatibility] support torch 2.2 (#5875) by Guangyao Zhang

Chat

Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang

Shardformer

[ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
[ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
[shardformer] DeepseekMoE support (#5871) by Haze188
[shardformer] fix the moe (#5883) by Wang Binluo
[Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
[shardformer]delete xformers (#5859) by flybird11111

Auto parallel

[Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö

Zero

[zero] support all-gather overlap (#5898) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
[pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]

Feature

[Feature] Enable PP + SP for llama (#5868) by Edenzzzz

Hotfix

[HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
[Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
[hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188

Feat

[Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu

Hoxfix

[Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz

Quant

[quant] fix bitsandbytes version check (#5882) by Hongxin Liu

Doc

[doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz

Moe/zero

[MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188

Full Changelog: v0.4.1...v0.4.0

Assets 2

28 Jun 02:51

github-actions

v0.4.0

eaea88c

Version v0.4.0 Release Today!

What's Changed

Release

[release] update version (#5864) by Hongxin Liu

Inference

[Inference]Lazy Init Support (#5785) by Runyu Lu

Shardformer

[shardformer] Support the T5ForTokenClassification model (#5816) by Guangyao Zhang

Zero

[zero] use bucket during allgather (#5860) by Hongxin Liu

Gemini

[gemini] fixes for benchmarking (#5847) by botbw
[gemini] fix missing return (#5845) by botbw

Feature

[Feature] optimize PP overlap (#5735) by Edenzzzz

Doc

[doc] add GPU cloud playground (#5851) by binmakeswell
[doc] fix open sora model weight link (#5848) by binmakeswell
[doc] opensora v1.2 news (#5846) by binmakeswell

Full Changelog: v0.4.0...v0.3.9

Assets 2

20 Jun 05:35

github-actions

v0.3.9

bd3e34f

Version v0.3.9 Release Today!

What's Changed

Release

[release] update version (#5833) by Hongxin Liu

Fix

[Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao

Shardformer

[shardformer] Change atol in test command-r weight-check to pass pytest (#5835) by Guangyao Zhang
Merge pull request #5818 from GuangyaoZhang/command-r by Guangyao Zhang
[shardformer] upgrade transformers to 4.39.3 (#5815) by flybird11111
[shardformer] fix modeling of bloom and falcon (#5796) by Hongxin Liu
[shardformer] fix import (#5788) by Hongxin Liu

Devops

[devops] Remove building on PR when edited to avoid skip issue (#5836) by Guangyao Zhang
[devops] fix docker ci (#5780) by Hongxin Liu

Launch

[launch] Support IPv4 host initialization in launch (#5822) by Kai Lv

Misc

[misc] Add dist optim to doc sidebar (#5806) by Edenzzzz
[misc] update requirements (#5787) by Hongxin Liu
[misc] fix dist logger (#5782) by Hongxin Liu
[misc] Accelerate CI for zero and dist optim (#5758) by Edenzzzz
[misc] update dockerfile (#5776) by Hongxin Liu

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Gemini

[gemini] quick fix on possible async operation (#5803) by botbw
[Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
[gemini] optimize reduce scatter d2h copy (#5760) by botbw

Inference

[Inference] Fix flash-attn import and add model test (#5794) by Li Xingjian
[Inference]refactor baichuan (#5791) by Runyu Lu
Merge pull request #5771 from char-1ee/refactor/modeling by Li Xingjian
[Inference]Add Streaming LLM (#5745) by yuehuayingxueluo

Test

[test] fix qwen2 pytest distLarge (#5797) by Guangyao Zhang
[test] fix chatglm test kit (#5793) by Hongxin Liu
[test] Fix/fix testcase (#5770) by duanjunwen

Colossalchat

Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang

Install

[install]fix setup (#5786) by flybird11111

Hotfix

[hotfix] fix testcase in test_fx/test_tracer (#5779) by duanjunwen
[hotfix] fix llama flash attention forward (#5777) by flybird11111
[Hotfix] Add missing init file in inference.executor (#5774) by Yuanheng Zhao

Test/ci

[Test/CI] remove test cases to reduce CI duration (#5753) by botbw

Ci/tests

[CI/tests] simplify some test case to reduce testing time (#5755) by Haze188

Full Changelog: v0.3.9...v0.3.8

Assets 2

31 May 11:41

github-actions

v0.3.8

68359ed

Version v0.3.8 Release Today!

What's Changed

Release

[release] update version (#5752) by Hongxin Liu

Fix/example

[Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao

Gemini

Merge pull request #5749 from hpcaitech/prefetch by botbw
Merge pull request #5754 from Hz188/prefetch by botbw
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
Merge pull request #5733 from Hz188/feature/prefetch by botbw
Merge pull request #5731 from botbw/prefetch by botbw
[gemini] init auto policy prefetch by hxwang
Merge pull request #5722 from botbw/prefetch by botbw
[gemini] maxprefetch means maximum work to keep by hxwang
[gemini] use compute_chunk to find next chunk by hxwang
[gemini] prefetch chunks by hxwang
[gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

[chore] refactor profiler utils by hxwang
[chore] remove unnecessary assert since compute list might not be recorded by hxwang
[chore] remove unnecessary test & changes by hxwang
Merge pull request #5738 from botbw/prefetch by Haze188
[chore] fix init error by hxwang
[chore] Update placement_policy.py by botbw
[chore] remove debugging info by hxwang
[chore] remove print by hxwang
[chore] refactor & sync by hxwang
[chore] sync by hxwang

Bug

[bug] continue fix by hxwang
[bug] workaround for idx fix by hxwang
[bug] fix early return (#5740) by botbw

Bugs

[bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

[inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
[Inference]Fix readme and example for API server (#5742) by Jianghai
[inference] release (#5747) by binmakeswell
[Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
[Inference] Fix API server, test and example (#5712) by Jianghai
[Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
[Inference] Add example test_ci script by CjhHa1
[Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
[Inference] resolve rebase conflicts by CjhHa1
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
[Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
[Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
[Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
[Inference] Fix quant bits order (#5681) by 傅剑寒
[inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
[Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo

Feature

[Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
[Feature] qlora support (#5586) by linsj20

Example

[example] add profile util for llama by hxwang
[example] Update Inference Example (#5725) by Yuanheng Zhao

Colossal-inference

[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

[NFC] fix requirements (#5744) by Yuanheng Zhao
[NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao

Ci

[ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
[ci] Fix example tests (#5714) by Yuanheng Zhao

Sync

Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
[sync] Sync feature/colossal-infer with main by Yuanheng Zhao
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
[sync] resolve conflicts of merging main by Yuanheng Zhao

Shardformer

[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
[Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
[shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Doc

[doc] Update Inference Readme (#5736) by Yuanheng Zhao

Fix/inference

[Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao

Lazy

[lazy] fix lazy cls init (#5720) by flybird11111

Misc

[misc] Update PyTorch version in docs (#5724) by binmakeswell
[misc] Update PyTorch version in docs (#5711) by Edenzzzz
[misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
[misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu

Colossal-llama

[Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
[Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
[Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

[Feat]Inference RPC Server Support (#5705) by Runyu Lu

Hotfix

[hotfix] fix inference typo (#5438) by hugo-syn
[hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
[hotfix] Fix KV He...

Assets 2

27 Apr 11:00

github-actions

v0.3.7

4cfbf30

Version v0.3.7 Release Today!

What's Changed

Release

[release] update version (#5654) by Hongxin Liu
[release] grok-1 inference benchmark (#5500) by binmakeswell
[release] grok-1 314b inference (#5490) by binmakeswell

Hotfix

[hotfix] add soft link to support required files (#5661) by Tong Li
[hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
[hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
[hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
[hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
[hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
[hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

[news] llama3 and open-sora v1.1 (#5655) by binmakeswell

Lazyinit

[lazyinit] skip whisper test (#5653) by Hongxin Liu

Shardformer

[shardformer] refactor pipeline grad ckpt config (#5646) by Hongxin Liu
[shardformer] fix chatglm implementation (#5644) by Hongxin Liu
[shardformer] remove useless code (#5645) by flybird11111
[shardformer] update transformers (#5583) by Wang Binluo
[shardformer] fix pipeline grad ckpt (#5620) by Hongxin Liu
[shardformer] refactor embedding resize (#5603) by flybird11111
[shardformer] Sequence Parallelism Optimization (#5533) by Zhongkai Zhao
[shardformer] fix pipeline forward error if custom layer distribution is used (#5189) by Insu Jang
[shardformer] update colo attention to support custom mask (#5510) by Hongxin Liu
[shardformer]Fix lm parallel. (#5480) by flybird11111
[shardformer] fix gathering output when using tensor parallelism (#5431) by flybird11111

Fix

[Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
[fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
[Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
[fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

[coloattention]modify coloattention (#5627) by flybird11111

Example

[example] llama3 (#5631) by binmakeswell
[example] update Grok-1 inference (#5495) by Yuanheng Zhao
[example] add grok-1 inference (#5485) by Hongxin Liu

Exampe

[exampe] update llama example (#5626) by Hongxin Liu

Feature

[Feature] Support LLaMA-3 CPT and ST (#5619) by Tong Li

Zero

[zero] support multiple (partial) backward passes (#5596) by Hongxin Liu

Doc

[doc] fix ColossalMoE readme (#5599) by Camille Zhong
[doc] update open-sora demo (#5479) by binmakeswell
[doc] release Open-Sora 1.0 with model weights (#5468) by binmakeswell

Devops

[devops] remove post commit ci (#5566) by Hongxin Liu
[devops] fix example test ci (#5504) by Hongxin Liu
[devops] fix compatibility (#5444) by Hongxin Liu

Shardformer, pipeline

[shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

[ColossalChat] Update RLHF V2 (#5286) by YeAnbang

Format

[format] applied code formatting on changed files in pull request 5510 (#5517) by github-actions[bot]

Full Changelog: v0.3.7...v0.3.6

Assets 2

07 Mar 15:38

github-actions

v0.3.6

8020f42

Version v0.3.6 Release Today!

What's Changed

Release

[release] update version (#5411) by Hongxin Liu

Colossal-llama2

[colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

[hotfix] fix stable diffusion inference bug. (#5289) by Youngon
[hotfix] fix typo change MoECheckpintIO to MoECheckpointIO (#5335) by digger yu
[hotfix] fix typo change enabel to enable under colossalai/shardformer/ (#5317) by digger yu
[hotfix] fix typo change _descrption to _description (#5331) by digger yu
[hotfix] fix typo of openmoe model source (#5403) by Luo Yihang
[hotfix] fix sd vit import error (#5420) by MickeyCHAN
[hotfix] Fix wrong import in meta_registry (#5392) by Stephan Kölker
[hotfix] fix variable type for top_p (#5313) by CZYCW

Doc

[doc] Fix typo s/infered/inferred/ (#5288) by hugo-syn
[doc] update some translations with README-zh-Hans.md (#5382) by digger yu
[doc] sora release (#5425) by binmakeswell
[doc] fix blog link by binmakeswell
[doc] fix blog link by binmakeswell
[doc] updated installation command (#5389) by Frank Lee
[doc] Fix typo (#5361) by yixiaoer

Eval-hotfix

[eval-hotfix] set few_shot_data to None when few shot is disabled (#5422) by Dongruixuan Li

Devops

[devops] fix extention building (#5427) by Hongxin Liu

Example

[example]add gpt2 benchmark example script. (#5295) by flybird11111
[example] reuse flash attn patch (#5400) by Hongxin Liu

Workflow

[workflow] added pypi channel (#5412) by Frank Lee

Shardformer

[shardformer]gather llama logits (#5398) by flybird11111

Setup

[setup] fixed nightly release (#5388) by Frank Lee

Fsdp

[fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

[extension] hotfix jit extension setup (#5402) by Hongxin Liu

Llama

[llama] fix training and inference scripts (#5384) by Hongxin Liu

Full Changelog: v0.3.6...v0.3.5

Assets 2

23 Feb 08:46

github-actions

v0.3.5

adae123

Version v0.3.5 Release Today!

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by [github-actions[bot]](https://api.github.com/users/githu...

Assets 2

Releases: hpcaitech/ColossalAI

Version v0.4.4 Release Today!

What's Changed

Release

Colossaleval

Moe

Sp

Doc

Fp8

Pre-commit.ci

Hotfix

Feature

Version v0.4.3 Release Today!

What's Changed

Release

Fp8

Hotfix

Colossalai/checkpoint_io/...

Colossal-llama

Plugin

Ci

Pre-commit.ci

Colossalchat

Misc

Feature

Chat

Test ci

Docs

Test

Lora

Feat

Chore

Moe

Version v0.4.2 Release Today!

What's Changed

Release

Zero

Feat

Shardformer

Chat

Feature

Hotfix

Fix bug

Colossalchat

Examples

Plugin

Version v0.4.1 Release Today!

What's Changed

Release

Misc

Compatibility

Chat

Shardformer

Auto parallel

Zero

Pre-commit.ci

Feature

Hotfix

Feat

Hoxfix

Quant

Doc

Moe/zero

Version v0.4.0 Release Today!

What's Changed

Release

Inference

Shardformer

Zero

Gemini

Feature

Doc

Version v0.3.9 Release Today!

What's Changed

Release

Fix

Shardformer

Devops

Launch

Misc