Skip to content

Releases: hpcaitech/ColossalAI

Version v0.4.4 Release Today!

19 Sep 02:53
dabc2e7
Compare
Choose a tag to compare

What's Changed

Release

Colossaleval

Moe

  • [moe] add parallel strategy for shared_expert && fix test for deepseek (#6063) by botbw

Sp

Doc

Fp8

Pre-commit.ci

Hotfix

  • [hotfix] moe hybrid parallelism benchmark & follow-up fix (#6048) by botbw

Feature

Full Changelog: v0.4.4...v0.4.3

Version v0.4.3 Release Today!

10 Sep 02:39
b3db105
Compare
Choose a tag to compare

What's Changed

Release

Fp8

Hotfix

Colossalai/checkpoint_io/...

  • [colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan

Colossal-llama

Plugin

Ci

  • [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan

Pre-commit.ci

Colossalchat

Misc

  • [misc] Use dist logger in plugins (#6011) by Edenzzzz
  • [misc] update compatibility (#6008) by Hongxin Liu
  • [misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
  • [misc] remove useless condition by haze188
  • [misc] fix ci failure: change default value to false in moe plugin by haze188
  • [misc] remove incompatible test config by haze188
  • [misc] remove debug/print code by haze188
  • [misc] skip redunant test by haze188
  • [misc] solve booster hang by rename the variable by haze188

Feature

Chat

Test ci

Docs

  • [Docs] clarify launch port by Edenzzzz

Test

  • [test] add zero fp8 test case by ver217
  • [test] add check by hxwang
  • [test] fix test: test_zero1_2 by hxwang
  • [test] add mixtral modelling test by botbw
  • [test] pass mixtral shardformer test by botbw
  • [test] mixtra pp shard test by hxwang
  • [test] add mixtral transformer test by hxwang
  • [test] add mixtral for sequence classification by hxwang

Lora

Feat

Chore

  • [chore] remove redundant test case, print string & reduce test tokens by botbw
  • [chore] docstring by hxwang
  • [chore] change moe_pg_mesh to private by hxwang
  • [chore] solve moe ckpt test failure and some other arg pass failure by hxwang
  • [chore] minor fix after rebase by hxwang
  • [chore] minor fix by hxwang
  • [chore] arg pass & remove drop token by hxwang
  • [chore] trivial fix by botbw
  • [chore] manually revert unintended commit by botbw
  • [chore] handle non member group by hxwang

Moe

  • [moe] solve dp axis issue by botbw
  • [moe] remove force_overlap_comm flag and add warning instead by hxwang
  • Revert "[moe] implement submesh initialization" by hxwang
  • [moe] refactor mesh assignment by hxwang
  • [moe] deepseek moe sp support by haze188
  • [moe] remove ops by hxwang
  • [moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
  • [moe] finalize test (no pp) by hxwang
  • [moe] init moe plugin comm setting with sp by hxwang
  • [moe] clean legacy code by hxwang
  • [moe] test deepseek by hxwang
  • [moe] implement tp by botbw
  • [moe] add mixtral dp grad scaling when not all experts are activated by botbw...
Read more

Version v0.4.2 Release Today!

31 Jul 02:06
09c5f72
Compare
Choose a tag to compare

What's Changed

Release

Zero

Feat

  • [Feat] Distrifusion Acceleration Support for Diffusion Inference (#5895) by Runyu Lu

Shardformer

Chat

Feature

  • [Feature] Add a switch to control whether the model checkpoint needs to be saved after each epoch ends (#5941) by zhurunhua

Hotfix

Fix bug

  • [FIX BUG] convert env param to int in (#5934) by Gao, Ruiyuan
  • [FIX BUG] UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value (#5931) by zhurunhua

Colossalchat

Examples

  • [Examples] Add lazy init to OPT and GPT examples (#5924) by Edenzzzz

Plugin

  • [plugin] support all-gather overlap for hybrid parallel (#5919) by Hongxin Liu

Full Changelog: v0.4.2...v0.4.1

Version v0.4.1 Release Today!

17 Jul 09:30
73494de
Compare
Choose a tag to compare

What's Changed

Release

Misc

Compatibility

Chat

Shardformer

Auto parallel

  • [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö

Zero

Pre-commit.ci

Feature

Hotfix

  • [HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
  • [Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
  • [hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188

Feat

  • [Feat] Diffusion Model(PixArtAlpha/StableDiffusion3) Support (#5838) by Runyu Lu

Hoxfix

  • [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz

Quant

Doc

  • [doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz

Moe/zero

  • [MoE/ZeRO] Moe refactor with zero refactor (#5821) by Haze188

Full Changelog: v0.4.1...v0.4.0

Version v0.4.0 Release Today!

28 Jun 02:51
eaea88c
Compare
Choose a tag to compare

What's Changed

Release

Inference

Shardformer

Zero

Gemini

Feature

Doc

Full Changelog: v0.4.0...v0.3.9

Version v0.3.9 Release Today!

20 Jun 05:35
bd3e34f
Compare
Choose a tag to compare

What's Changed

Release

Fix

  • [Fix] Fix spec-dec Glide LlamaModel for compatibility with transformers (#5837) by Yuanheng Zhao

Shardformer

Devops

Launch

  • [launch] Support IPv4 host initialization in launch (#5822) by Kai Lv

Misc

Pre-commit.ci

Gemini

  • [gemini] quick fix on possible async operation (#5803) by botbw
  • [Gemini] Use async stream to prefetch and h2d data moving (#5781) by Haze188
  • [gemini] optimize reduce scatter d2h copy (#5760) by botbw

Inference

Test

Colossalchat

  • Merge pull request #5759 from hpcaitech/colossalchat_upgrade by YeAnbang

Install

Hotfix

Test/ci

  • [Test/CI] remove test cases to reduce CI duration (#5753) by botbw

Ci/tests

  • [CI/tests] simplify some test case to reduce testing time (#5755) by Haze188

Full Changelog: v0.3.9...v0.3.8

Version v0.3.8 Release Today!

31 May 11:41
68359ed
Compare
Choose a tag to compare

What's Changed

Release

Fix/example

Gemini

  • Merge pull request #5749 from hpcaitech/prefetch by botbw
  • Merge pull request #5754 from Hz188/prefetch by botbw
  • [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
  • [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
  • Merge pull request #5733 from Hz188/feature/prefetch by botbw
  • Merge pull request #5731 from botbw/prefetch by botbw
  • [gemini] init auto policy prefetch by hxwang
  • Merge pull request #5722 from botbw/prefetch by botbw
  • [gemini] maxprefetch means maximum work to keep by hxwang
  • [gemini] use compute_chunk to find next chunk by hxwang
  • [gemini] prefetch chunks by hxwang
  • [gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

  • [chore] refactor profiler utils by hxwang
  • [chore] remove unnecessary assert since compute list might not be recorded by hxwang
  • [chore] remove unnecessary test & changes by hxwang
  • Merge pull request #5738 from botbw/prefetch by Haze188
  • [chore] fix init error by hxwang
  • [chore] Update placement_policy.py by botbw
  • [chore] remove debugging info by hxwang
  • [chore] remove print by hxwang
  • [chore] refactor & sync by hxwang
  • [chore] sync by hxwang

Bug

Bugs

  • [bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

Feature

  • [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
  • [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
  • Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
  • [Feature] qlora support (#5586) by linsj20

Example

Colossal-inference

  • [Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

Ci

Sync

Shardformer

  • [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
  • [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
  • [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
  • Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
  • [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
  • [shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

Doc

Fix/inference

Lazy

Misc

Colossal-llama

  • [Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

  • [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
  • [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
  • [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

Hotfix

Read more

Version v0.3.7 Release Today!

27 Apr 11:00
4cfbf30
Compare
Choose a tag to compare

What's Changed

Release

Hotfix

  • [hotfix] add soft link to support required files (#5661) by Tong Li
  • [hotfix] Fixed fused layernorm bug without apex (#5609) by Edenzzzz
  • [hotfix] Fix examples no pad token & auto parallel codegen bug; (#5606) by Edenzzzz
  • [hotfix] fix typo s/get_defualt_parser /get_default_parser (#5548) by digger yu
  • [hotfix] quick fixes to make legacy tutorials runnable (#5559) by Edenzzzz
  • [hotfix] set return_outputs=False in examples and polish code (#5404) by Wenhao Chen
  • [hotfix] fix typo s/keywrods/keywords etc. (#5429) by digger yu

News

Lazyinit

Shardformer

Fix

  • [Fix]: implement thread-safety singleton to avoid deadlock for very large-scale training scenarios (#5625) by Season
  • [fix] fix typo s/muiti-node /multi-node etc. (#5448) by digger yu
  • [Fix] Grok-1 use tokenizer from the same pretrained path (#5532) by Yuanheng Zhao
  • [fix] fix grok-1 example typo (#5506) by Yuanheng Zhao

Coloattention

Example

Exampe

Feature

Zero

Doc

Devops

Shardformer, pipeline

  • [shardformer, pipeline] add gradient_checkpointing_ratio and heterogenous shard policy for llama (#5508) by Wenhao Chen

Colossalchat

Format

Full Changelog: v0.3.7...v0.3.6

Version v0.3.6 Release Today!

07 Mar 15:38
8020f42
Compare
Choose a tag to compare

What's Changed

Release

Colossal-llama2

  • [colossal-llama2] add stream chat examlple for chat version model (#5428) by Camille Zhong

Hotfix

Doc

Eval-hotfix

Devops

Example

Workflow

Shardformer

Setup

Fsdp

  • [fsdp] impl save/load shard model/optimizer (#5357) by QinLuo

Extension

Llama

Full Changelog: v0.3.6...v0.3.5

Version v0.3.5 Release Today!

23 Feb 08:46
adae123
Compare
Choose a tag to compare

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

  • [checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

  • [shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
  • [shardformer] llama support DistCrossEntropy (#5176) by flybird11111
  • [shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
  • [shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
  • [shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
  • [shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

Npu

Pipeline

  • [pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
  • [pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
  • [pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
  • [pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

Read more