Releases · hpcaitech/ColossalAI

[autoparallel] refactor handlers which reshape input tensors (#2615) by YuliangLiu0306
[autoparallel] adapt autoparallel tests with latest api (#2626) by YuliangLiu0306
[autoparallel] Patch meta information of torch.matmul (#2584) by Boyuan Yao

Tutorial

[tutorial] added energonai to opt inference requirements (#2625) by Frank Lee
[tutorial] add video link (#2619) by binmakeswell

Autochunk

[autochunk] support diffusion for autochunk (#2621) by oahzxl

Build

[build] fixed the doc build process (#2618) by Frank Lee

Test

[test] fixed the triton version for testing (#2608) by Frank Lee

Full Changelog: v0.2.2...v0.2.1

Assets 2

0 Join discussion

06 Feb 13:44

github-actions

v0.2.1

f566b0c

Version v0.2.1 Release Today!

What's Changed

Workflow

[workflow] fixed broken rellease workflows (#2604) by Frank Lee
[workflow] added cuda extension build test before release (#2598) by Frank Lee
[workflow] hooked pypi release with lark (#2596) by Frank Lee
[workflow] hooked docker release with lark (#2594) by Frank Lee
[workflow] added test-pypi check before release (#2591) by Frank Lee
[workflow] fixed the typo in the example check workflow (#2589) by Frank Lee
[workflow] hook compatibility test failure to lark (#2586) by Frank Lee
[workflow] hook example test alert with lark (#2585) by Frank Lee
[workflow] added notification if scheduled build fails (#2574) by Frank Lee
[workflow] added discussion stats to community report (#2572) by Frank Lee
[workflow] refactored compatibility test workflow for maintenability (#2560) by Frank Lee
[workflow] adjust the GPU memory threshold for scheduled unit test (#2558) by Frank Lee
[workflow] fixed example check workflow (#2554) by Frank Lee
[workflow] fixed typos in the leaderboard workflow (#2567) by Frank Lee
[workflow] added contributor and user-engagement report (#2564) by Frank Lee
[workflow] only report coverage for changed files (#2524) by Frank Lee
[workflow] fixed the precommit CI (#2525) by Frank Lee
[workflow] fixed changed file detection (#2515) by Frank Lee
[workflow] fixed the skip condition of example weekly check workflow (#2481) by Frank Lee
[workflow] automated bdist wheel build (#2459) by Frank Lee
[workflow] automated the compatiblity test (#2453) by Frank Lee
[workflow] fixed the on-merge condition check (#2452) by Frank Lee
[workflow] make test coverage report collapsable (#2436) by Frank Lee
[workflow] report test coverage even if below threshold (#2431) by Frank Lee
[workflow]auto comment with test coverage report (#2419) by Frank Lee
[workflow] auto comment if precommit check fails (#2417) by Frank Lee
[workflow] added translation for non-english comments (#2414) by Frank Lee
[workflow] added precommit check for code consistency (#2401) by Frank Lee
[workflow] refactored the example check workflow (#2411) by Frank Lee
[workflow] added nightly release to pypi (#2403) by Frank Lee
[workflow] added missing file change detection output (#2387) by Frank Lee
[workflow]New version: Create workflow files for examples' auto check (#2298) by ziyuhuang123
[workflow] fixed pypi release workflow error (#2328) by Frank Lee
[workflow] fixed pypi release workflow error (#2327) by Frank Lee
[workflow] added workflow to release to pypi upon version change (#2320) by Frank Lee
[workflow] removed unused assign reviewer workflow (#2318) by Frank Lee
[workflow] rebuild cuda kernels when kernel-related files change (#2317) by Frank Lee

Release

[release] v0.2.1 (#2602) by Frank Lee

Doc

[doc] updated readme for CI/CD (#2600) by Frank Lee
[doc] fixed issue link in pr template (#2577) by Frank Lee
[doc] updated the CHANGE_LOG.md for github release page (#2552) by Frank Lee
[doc] fixed the typo in pr template (#2556) by Frank Lee
[doc] added pull request template (#2550) by Frank Lee
[doc] update example link (#2520) by binmakeswell
[doc] update opt and tutorial links (#2509) by binmakeswell
[doc] added documentation for CI/CD (#2420) by Frank Lee
[doc] updated kernel-related optimisers' docstring (#2385) by Frank Lee
[doc] updated readme regarding pypi installation (#2406) by Frank Lee
[doc] hotfix #2377 by Jiarui Fang
[doc] hotfix #2377 by jiaruifang
[doc] update stable diffusion link (#2322) by binmakeswell
[doc] update diffusion doc (#2296) by binmakeswell
[doc] update news (#2295) by binmakeswell
[doc] update news by binmakeswell

Setup

[setup] fixed inconsistent version meta (#2578) by Frank Lee
[setup] refactored setup.py for dependency graph (#2413) by Frank Lee
[setup] support pre-build and jit-build of cuda kernels (#2374) by Frank Lee
[setup] make cuda extension build optional (#2336) by Frank Lee
[setup] remove torch dependency (#2333) by Frank Lee
[setup] removed the build dependency on colossalai (#2307) by Frank Lee

Tutorial

[tutorial] polish README (#2568) by binmakeswell
[tutorial] update fastfold tutorial (#2565) by oahzxl

Polish

[polish] polish ColoTensor and its submodules (#2537) by HELSON
[polish] polish code for get_static_torch_model (#2405) by HELSON

Kernel

[kernel] fixed repeated loading of kernels (#2549) by Frank Lee

Hotfix

[hotfix] fix zero ddp warmup check (#2545) by ver217
[hotfix] fix autoparallel demo (#2533) by YuliangLiu0306
[hotfix] fix lightning error (#2529) by HELSON
[hotfix] meta tensor default device. (#2510) by Super Daniel
[hotfix] gpt example titans bug #2493 (#2494) by Jiarui Fang
[hotfix] gpt example titans bug #2493 by jiaruifang
[hotfix] add norm clearing for the overflow step (#2416) by HELSON
[hotfix] add DISTPAN argument for benchmark (#2412) by HELSON
[hotfix] fix gpt gemini example (#2404) by HELSON
[hotfix] issue #2388 by Jiarui Fang
[hotfix] issue #2388 by jiaruifang
[hotfix] fix implement error in diffusers by Jiarui Fang
[hotfix] fix implement error in diffusers by 1SAA

Autochunk

[autochunk] add benchmark for transformer and alphafold (#2543) by oahzxl
[autochunk] support multi outputs chunk search (#2538) by oahzxl
[autochunk] support transformer (#2526) by oahzxl
[autochunk] support parsing blocks (#2506) by oahzxl
[autochunk] support autochunk on evoformer (#2497) by oahzxl
[autochunk] support evoformer tracer (#2485) by oahzxl
[autochunk] add autochunk feature by Jiarui Fang

Git

[git] remove invalid submodule (#2540) by binmakeswell

Gemini

[gemini] add profiler in the demo (#2534) by HELSON
[gemini] update the gpt example (#2527) by HELSON
[gemini] update ddp strict mode (#2518) by HELSON
[gemini] add get static torch model (#2356) by HELSON

Example

[example] Add fastfold tutorial (#2528) by [LuGY]...

Assets 2

03 Jan 12:29

github-actions

v0.2.0

26e171a

Version v0.2.0 Release Today!

What's Changed

Version

[version] 0.1.14 -> 0.2.0 (#2286) by Jiarui Fang

Examples

[examples] using args and combining two versions for PaLM (#2284) by ZijianYY
[examples] replace einsum with matmul (#2210) by ZijianYY

Doc

[doc] add feature diffusion v2, bloom, auto-parallel (#2282) by binmakeswell
[doc] updated the stable diffussion on docker usage (#2244) by Frank Lee

Zero

[zero] polish low level zero optimizer (#2275) by HELSON
[zero] fix error for BEiT models (#2169) by HELSON

Example

[example] add benchmark (#2276) by Ziyue Jiang
[example] fix save_load bug for dreambooth (#2280) by BlueRum
[example] GPT polish readme (#2274) by Jiarui Fang
[example] fix gpt example with 0.1.10 (#2265) by HELSON
[example] clear diffuser image (#2262) by Fazzie-Maqianli
[example] diffusion install from docker (#2239) by Jiarui Fang
[example] fix benchmark.sh for gpt example (#2229) by HELSON
[example] make palm + GeminiDPP work (#2227) by Jiarui Fang
[example] Palm adding gemini, still has bugs (#2221) by ZijianYY
[example] update gpt example (#2225) by HELSON
[example] add benchmark.sh for gpt (#2226) by Jiarui Fang
[example] update gpt benchmark (#2219) by HELSON
[example] update GPT example benchmark results (#2212) by Jiarui Fang
[example] update gpt example for larger model scale (#2211) by Jiarui Fang
[example] update gpt readme with performance (#2206) by Jiarui Fang
[example] polish doc (#2201) by ziyuhuang123
[example] Change some training settings for diffusion (#2195) by BlueRum
[example] support Dreamblooth (#2188) by Fazzie-Maqianli
[example] gpt demo more accuracy tflops (#2178) by Jiarui Fang
[example] add palm pytorch version (#2172) by Jiarui Fang
[example] update vit readme (#2155) by Jiarui Fang
[example] add zero1, zero2 example in GPT examples (#2146) by HELSON

Hotfix

[hotfix] fix fp16 optimzier bug (#2273) by YuliangLiu0306
[hotfix] fix error for torch 2.0 (#2243) by xcnick
[hotfix] Fixing the bug related to ipv6 support by Tongping Liu
[hotfix] correcnt cpu_optim runtime compilation (#2197) by Jiarui Fang
[hotfix] add kwargs for colo_addmm (#2171) by Tongping Liu
[hotfix] Jit type hint #2161 (#2164) by アマデウス
[hotfix] fix auto policy of test_sharded_optim_v2 (#2157) by Jiarui Fang
[hotfix] fix aten default bug (#2158) by YuliangLiu0306

Autoparallel

[autoparallel] fix spelling error (#2270) by YuliangLiu0306
[autoparallel] gpt2 autoparallel examples (#2267) by YuliangLiu0306
[autoparallel] patch torch.flatten metainfo for autoparallel (#2247) by Boyuan Yao
[autoparallel] autoparallel initialize (#2238) by YuliangLiu0306
[autoparallel] fix construct meta info. (#2245) by Super Daniel
[autoparallel] record parameter attribute in colotracer (#2217) by YuliangLiu0306
[autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162) by Boyuan Yao
[autoparallel] new metainfoprop based on metainfo class (#2179) by Boyuan Yao
[autoparallel] update getitem handler (#2207) by YuliangLiu0306
[autoparallel] update_getattr_handler (#2193) by YuliangLiu0306
[autoparallel] add gpt2 performance test code (#2194) by YuliangLiu0306
[autoparallel] integrate_gpt_related_tests (#2134) by YuliangLiu0306
[autoparallel] memory estimation for shape consistency (#2144) by Boyuan Yao
[autoparallel] use metainfo in handler (#2149) by YuliangLiu0306

Gemini

[Gemini] fix the convert_to_torch_module bug (#2269) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232) by Ziyue Jiang

Builder

[builder] builder for scaled_upper_triang_masked_softmax (#2234) by Jiarui Fang
[builder] polish builder with better base class (#2216) by Jiarui Fang
[builder] raise Error when CUDA_HOME is not set (#2213) by Jiarui Fang
[builder] multihead attn runtime building (#2203) by Jiarui Fang
[builder] unified cpu_optim fused_optim inferface (#2190) by Jiarui Fang
[builder] use runtime builder for fused_optim (#2189) by Jiarui Fang
[builder] runtime adam and fused_optim builder (#2184) by Jiarui Fang
[builder] use builder() for cpu adam and fused optim in setup.py (#2187) by Jiarui Fang

Logger

[logger] hotfix, missing _FORMAT (#2231) by Super Daniel

Diffusion

[diffusion] update readme (#2214) by HELSON

Testing

[testing] add beit model for unit testings (#2196) by HELSON

NFC

[NFC] fix some typos' (#2175) by ziyuhuang123
[NFC] update news link (#2191) by binmakeswell
[NFC] fix a typo 'stable-diffusion-typo-fine-tune' by Arsmart1

Exmaple

[exmaple] diffuser, support quant inference for stable diffusion (#2186) by BlueRum
[exmaple] add vit missing functions (#2154) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage (#2156) by Ziyue Jiang

Full Changelog: v0.2.0...v0.1.13

Assets 2

0 Join discussion

20 Dec 02:30

github-actions

v0.1.13

9b39170

Version v0.1.13 Release Today!

What's Changed

Version

[version] 0.1.13 (#2152) by Jiarui Fang
Revert "[version] version to v0.1.13 (#2139)" (#2153) by Jiarui Fang
[version] version to v0.1.13 (#2139) by Jiarui Fang

Gemini

[Gemini] GeminiDPP convert to PyTorch Module. (#2151) by Jiarui Fang
[Gemini] Update coloinit_ctx to support meta_tensor (#2147) by BlueRum
[Gemini] revert ZeROInitCtx related tracer (#2138) by Jiarui Fang
[Gemini] update API of the chunkmemstatscollector. (#2129) by Jiarui Fang
[Gemini] update the non model data record method in runtime memory tracer (#2128) by Jiarui Fang
[Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127) by Jiarui Fang
[Gemini] update non model data calculation method (#2126) by Jiarui Fang
[Gemini] hotfix the unittest bugs (#2125) by Jiarui Fang
[Gemini] mapping of preop timestep and param (#2124) by Jiarui Fang
[Gemini] chunk init using runtime visited param order (#2115) by Jiarui Fang
[Gemini] chunk init use OrderedParamGenerator (#2110) by Jiarui Fang

Nfc

[NFC] remove useless graph node code (#2150) by Jiarui Fang
[NFC] update chunk manager API (#2119) by Jiarui Fang
[NFC] polish comments for Chunk class (#2116) by Jiarui Fang

Autoparallel

[autoparallel] process size nodes in runtime pass (#2130) by YuliangLiu0306
[autoparallel] implement softmax handler (#2132) by YuliangLiu0306
[autoparallel] gpt2lp runtimee test (#2113) by YuliangLiu0306

Example

Merge pull request #2120 from Fazziekey/example/stablediffusion-v2 by Fazzie-Maqianli

Optimizer

[optimizer] add div_scale for optimizers (#2117) by HELSON

Pp middleware

[PP Middleware] Add bwd and step for PP middleware (#2111) by Ziyue Jiang

Full Changelog: v0.1.13...v0.1.12

Assets 2

09 Dec 17:59

github-actions

v0.1.12

63fbba3

Version v0.1.12 Release Today!

What's Changed

Zero

[zero] add L2 gradient clipping for ZeRO (#2112) by HELSON

Gemini

[gemini] get the param visited order during runtime (#2108) by Jiarui Fang
[Gemini] NFC, polish search_chunk_configuration (#2107) by Jiarui Fang
[Gemini] gemini use the runtime memory tracer (RMT) (#2099) by Jiarui Fang
[Gemini] make RuntimeMemTracer work correctly (#2096) by Jiarui Fang
[Gemini] remove eval in gemini unittests! (#2092) by Jiarui Fang
[Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091) by Jiarui Fang
[Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090) by Jiarui Fang
[Gemini] use MemStats in Runtime Memory tracer (#2088) by Jiarui Fang
[Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) by Jiarui Fang
[Gemini] remove static tracer (#2083) by Jiarui Fang
[Gemini] ParamOpHook -> ColoParamOpHook (#2080) by Jiarui Fang
[Gemini] polish runtime tracer tests (#2077) by Jiarui Fang
[Gemini] rename hooks related to runtime mem tracer (#2076) by Jiarui Fang
[Gemini] add albert in test models. (#2075) by Jiarui Fang
[Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) by Jiarui Fang
[Gemini] remove not used MemtracerWrapper (#2072) by Jiarui Fang
[Gemini] fix grad unreleased issue and param recovery issue (#2052) by Zihao

Hotfix

[hotfix] fix a type in ColoInitContext (#2106) by Jiarui Fang
[hotfix] update test for latest version (#2060) by YuliangLiu0306
[hotfix] skip gpt tracing test (#2064) by YuliangLiu0306

Colotensor

[ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang

Autoparallel

[autoparallel] support linear function bias addition (#2104) by YuliangLiu0306
[autoparallel] support addbmm computation (#2102) by YuliangLiu0306
[autoparallel] add sum handler (#2101) by YuliangLiu0306
[autoparallel] add bias addtion function class (#2098) by YuliangLiu0306
[autoparallel] complete gpt related module search (#2097) by YuliangLiu0306
[autoparallel]add embedding handler (#2089) by YuliangLiu0306
[autoparallel] add tensor constructor handler (#2082) by YuliangLiu0306
[autoparallel] add non_split linear strategy (#2078) by YuliangLiu0306
[autoparallel] Add F.conv metainfo (#2069) by Boyuan Yao
[autoparallel] complete gpt block searching (#2065) by YuliangLiu0306
[autoparallel] add binary elementwise metainfo for auto parallel (#2058) by Boyuan Yao
[autoparallel] fix forward memory calculation (#2062) by Boyuan Yao
[autoparallel] adapt solver with self attention (#2037) by YuliangLiu0306

Version

[version] 0.1.11rc5 -> 0.1.12 (#2103) by Jiarui Fang

Pipeline middleware

[Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
[Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang

Fx

[fx] An experimental version of ColoTracer.' (#2002) by Super Daniel

Example

[example] update GPT README (#2095) by ZijianYY

Device

[device] update flatten device mesh usage (#2079) by YuliangLiu0306

Test

[test] bert test in non-distributed way (#2074) by Jiarui Fang

Pipeline

[Pipeline] Add Topo Class (#2059) by Ziyue Jiang

Examples

[examples] update autoparallel demo (#2061) by YuliangLiu0306

Full Changelog: v0.1.12...v0.1.11rc5

Assets 2

30 Nov 16:26

github-actions

v0.1.11rc5

d3499c9

Version v0.1.11rc5 Release Today!

What's Changed

Release

[release] update to 0.1.11rc5 (#2053) by Frank Lee

Cli

[cli] updated installation cheheck with more inforamtion (#2050) by Frank Lee

Gemini

[gemini] fix init bugs for modules (#2047) by HELSON
[gemini] add arguments (#2046) by HELSON
[Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) by Zihao
[Gemini] more tests for Gemini (#2038) by Jiarui Fang
[Gemini] more rigorous unit tests for run_fwd_bwd (#2034) by Jiarui Fang
[Gemini] paramWrapper paramTracerHook unitest (#2030) by Zihao
[Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) by Jiarui Fang
[gemini] param_trace_hook (#2020) by Zihao
[Gemini] add unitests to check gemini correctness (#2015) by Jiarui Fang
[Gemini] ParamMemHook (#2008) by Zihao
[Gemini] param_tracer_wrapper and test case (#2009) by Zihao

Setup

[setup] supported conda-installed torch (#2048) by Frank Lee

Test

[test] align model name with the file name. (#2045) by Jiarui Fang

Hotfix

[hotfix] hotfix Gemini for no leaf modules bug (#2043) by Jiarui Fang
[hotfix] add bert test for gemini fwd bwd (#2035) by Jiarui Fang
[hotfix] revert bug PRs (#2016) by Jiarui Fang

Zero

[zero] fix testing parameters (#2042) by HELSON
[zero] fix unit-tests (#2039) by HELSON
[zero] test gradient accumulation (#1964) by HELSON

Testing

[testing] fix testing models (#2036) by HELSON

Rpc

[rpc] split with dag (#2028) by Ziyue Jiang

Autoparallel

[autoparallel] add split handler (#2032) by YuliangLiu0306
[autoparallel] add experimental permute handler (#2029) by YuliangLiu0306
[autoparallel] add runtime pass and numerical test for view handler (#2018) by YuliangLiu0306
[autoparallel] add experimental view handler (#2011) by YuliangLiu0306
[autoparallel] mix gather (#1977) by Genghan Zhang

Fx

[fx]Split partition with DAG information (#2025) by Ziyue Jiang

Github

[GitHub] update issue template (#2023) by binmakeswell

Workflow

[workflow] removed unused pypi release workflow (#2022) by Frank Lee

Full Changelog: v0.1.11rc5...v0.1.11rc4

Assets 2

23 Nov 09:26

github-actions

v0.1.11rc4

7242bff

Version v0.1.11rc4 Release Today!

What's Changed

Workflow

[workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
[workflow] fixed the typo in condarc (#2006) by Frank Lee
[workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee

Gemini

[Gemini] add an inline_op_module to common test models and polish unitests. (#2004) by Jiarui Fang
[Gemini] open grad checkpoint when model building (#1984) by Jiarui Fang
[Gemini] add bert for MemtracerWrapper unintests (#1982) by Jiarui Fang
[Gemini] MemtracerWrapper unittests (#1981) by Jiarui Fang
[Gemini] memory trace hook (#1978) by Jiarui Fang
[Gemini] independent runtime tracer (#1974) by Jiarui Fang
[Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) by Jiarui Fang
[Gemini] clean no used MemTraceOp (#1970) by Jiarui Fang
[Gemini] polish memstats collector (#1962) by Jiarui Fang
[Gemini] add GeminiAdamOptimizer (#1960) by Jiarui Fang

Autoparallel

[autoparallel] Add metainfo support for F.linear (#1987) by Boyuan Yao
[autoparallel] use pytree map style to process data (#1989) by YuliangLiu0306
[autoparallel] adapt handlers with attention block (#1990) by YuliangLiu0306
[autoparallel] support more flexible data type (#1967) by YuliangLiu0306
[autoparallel] add pooling metainfo (#1968) by Boyuan Yao
[autoparallel] support distributed dataloader option (#1906) by YuliangLiu0306
[autoparallel] Add alpha beta (#1973) by Genghan Zhang
[autoparallel] add torch.nn.ReLU metainfo (#1868) by Boyuan Yao
[autoparallel] support addmm in tracer and solver (#1961) by YuliangLiu0306
[autoparallel] remove redundancy comm node (#1893) by YuliangLiu0306

Fx

[fx] add more meta_registry for MetaTensor execution. (#2000) by Super Daniel

Hotfix

[hotfix] make Gemini work for conv DNN (#1998) by Jiarui Fang

Example

[example] add diffusion inference (#1986) by Fazzie-Maqianli
[example] enhance GPT demo (#1959) by Jiarui Fang
[example] add vit (#1942) by Jiarui Fang

Kernel

[kernel] move all symlinks of kernel to colossalai._C (#1971) by ver217

Polish

[polish] remove useless file _mem_tracer_hook.py (#1963) by Jiarui Fang

Zero

[zero] fix memory leak for zero2 (#1955) by HELSON

Colotensor

[ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) by Jiarui Fang
[ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang

Tutorial

[tutorial] polish all README (#1946) by binmakeswell
[tutorial] added missing dummy dataloader (#1944) by Frank Lee
[tutorial] fixed pipeline bug for sequence parallel (#1943) by Frank Lee

Tensorparallel

[tensorparallel] fixed tp layers (#1938) by アマデウス

Sc demo

[sc demo] add requirements to spmd README (#1941) by YuliangLiu0306

Sc

[SC] remove redundant hands on (#1939) by Boyuan Yao

Full Changelog: v0.1.11rc4...v0.1.11rc3

Assets 2

13 Nov 07:37

github-actions

v0.1.11rc3

b42b672

Version v0.1.11rc3 Release Today!

What's Changed

Release

[release] update version (#1931) by ver217

Tutorial

[tutorial] polish README and OPT files (#1930) by binmakeswell
[tutorial] add synthetic dataset for opt (#1924) by ver217
[tutorial] updated hybrid parallel readme (#1928) by Frank Lee
[tutorial] added synthetic data for sequence parallel (#1927) by Frank Lee
[tutorial] removed huggingface model warning (#1925) by Frank Lee
Hotfix/tutorial readme index (#1922) by Frank Lee
[tutorial] modify hands-on of auto activation checkpoint (#1920) by Boyuan Yao
[tutorial] added synthetic data for hybrid parallel (#1921) by Frank Lee
[tutorial] added synthetic data for hybrid parallel (#1919) by Frank Lee
[tutorial] added synthetic dataset for auto parallel demo (#1918) by Frank Lee
[tutorial] updated auto parallel demo with latest data path (#1917) by Frank Lee
[tutorial] added data script and updated readme (#1916) by Frank Lee
[tutorial] add cifar10 for diffusion (#1907) by binmakeswell
[tutorial] removed duplicated tutorials (#1904) by Frank Lee
[tutorial] edited hands-on practices (#1899) by BoxiangW

Example

[example] update auto_parallel img path (#1910) by binmakeswell
[example] add cifar10 dadaset for diffusion (#1902) by Fazzie-Maqianli
[example] migrate diffusion and auto_parallel hands-on (#1871) by binmakeswell
[example] initialize tutorial (#1865) by binmakeswell
Merge pull request #1842 from feifeibear/jiarui/polish by Fazzie-Maqianli
[example] polish diffusion readme by jiaruifang

Sc

[SC] add GPT example for auto checkpoint (#1889) by Boyuan Yao
[sc] add examples for auto checkpoint. (#1880) by Super Daniel

Nfc

[NFC] polish colossalai/amp/naive_amp/init.py code style (#1905) by Junming Wu
[NFC] remove redundant dependency (#1869) by binmakeswell
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856) by yuxuan-lou
[NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855) by Ofey Chan
[NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
[NFC] polish colossalai/amp/apex_amp/init.py code style (#1853) by LuGY
[NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
[NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
[NFC] polish amp.naive_amp.grad_scaler code style by zbian
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845) by HELSON
[NFC] polish ./colossalai/amp/torch_amp/init.py code style (#1836) by Genghan Zhang
[NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
[NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829) by Sze-qq
[NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823) by Ziyue Jiang
[NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
[NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
[NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820) by Arsmart1
[NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
[NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816) by CsRic
[NFC] polish .github/workflows/build_gpu_8.yml code style (#1813) by Zangwei Zheng
[NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
[NFC] polish strategies_constructor.py code style (#1806) by binmakeswell

Doc

[doc] add news (#1901) by binmakeswell

Zero

[zero] migrate zero1&2 (#1878) by HELSON

Autoparallel

[autoparallel] user-friendly API for CheckpointSolver. (#1879) by Super Daniel
[autoparallel] fix linear logical convert issue (#1857) by YuliangLiu0306

Fx

[fx] metainfo_trace as an API. (#1873) by Super Daniel

Hotfix

[hotfix] pass test_complete_workflow (#1877) by Jiarui Fang

Inference

[inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) by Jiarui Fang
[inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang

Amp

[amp] add torch amp test (#1860) by xcnick

Diffusion

[diffusion] fix package conflicts (#1875) by HELSON

Utils

[utils] fixed lazy init context (#1867) by Frank Lee
[utils] remove lazy_memory_allocate from ColoInitContext (#1844) by Jiarui Fang

Full Changelog: v0.1.11rc3...v0.1.11rc2

Assets 2

4 Join discussion

08 Nov 14:44

github-actions

v0.1.11rc2

4ac7d3e

Version v0.1.11rc2 Release Today!

What's Changed

Autoparallel

[autoparallel] fix bugs caused by negative dim key (#1808) by YuliangLiu0306
[autoparallel] fix bias addition module (#1800) by YuliangLiu0306
[autoparallel] add batch norm metainfo (#1815) by Boyuan Yao
[autoparallel] add conv metainfo class for auto parallel (#1796) by Boyuan Yao
[autoparallel]add essential CommActions for broadcast oprands (#1793) by YuliangLiu0306
[autoparallel] refactor and add rotorc. (#1789) by Super Daniel
[autoparallel] add getattr handler (#1767) by YuliangLiu0306
[autoparallel] added matmul handler (#1763) by Frank Lee
[autoparallel] fix conv handler numerical test (#1771) by YuliangLiu0306
[autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764) by Super Daniel
[autoparallel] add numerical test for handlers (#1769) by YuliangLiu0306
[autoparallel] update CommSpec to CommActions (#1768) by YuliangLiu0306
[autoparallel] add numerical test for node strategies (#1760) by YuliangLiu0306
[autoparallel] refactor the runtime apply pass and add docstring to passes (#1757) by YuliangLiu0306
[autoparallel] added binary elementwise node handler (#1758) by Frank Lee
[autoparallel] fix param hook issue in transform pass (#1755) by YuliangLiu0306
[autoparallel] added addbmm handler (#1751) by Frank Lee
[autoparallel] shard param and buffer as expected (#1753) by YuliangLiu0306
[autoparallel] add sequential order to communication actions (#1735) by YuliangLiu0306
[autoparallel] recovered skipped test cases (#1748) by Frank Lee
[autoparallel] fixed wrong sharding strategy in conv handler (#1747) by Frank Lee
[autoparallel] fixed wrong generated strategy for dot op (#1746) by Frank Lee
[autoparallel] handled illegal sharding strategy in shape consistency (#1744) by Frank Lee
[autoparallel] handled illegal strategy in node handler (#1743) by Frank Lee
[autoparallel] handled illegal sharding strategy (#1728) by Frank Lee

Kernel

[kernel] added jit warmup (#1792) by アマデウス
[kernel] more flexible flashatt interface (#1804) by oahzxl
[kernel] skip tests of flash_attn and triton when they are not available (#1798) by Jiarui Fang

Gemini

[Gemini] make gemini usage simple (#1821) by Jiarui Fang

Checkpointio

[CheckpointIO] a uniform checkpoint I/O module (#1689) by ver217

Doc

[doc] polish diffusion README (#1840) by binmakeswell
[doc] remove obsolete API demo (#1833) by binmakeswell
[doc] add diffusion (#1827) by binmakeswell
[doc] add FastFold (#1766) by binmakeswell

Example

[example] remove useless readme in diffusion (#1831) by Jiarui Fang
[example] add TP to GPT example (#1828) by Jiarui Fang
[example] add stable diffuser (#1825) by Fazzie-Maqianli
[example] simplify the GPT2 huggingface example (#1826) by Jiarui Fang
[example] opt does not depend on Titans (#1811) by Jiarui Fang
[example] add GPT by Jiarui Fang
[example] add opt model in lauguage (#1809) by Jiarui Fang
[example] add diffusion to example (#1805) by Jiarui Fang

Nfc

[NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
[NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
[NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) by Ziheng Qin
[NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) by lucasliunju
[NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
[NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) by Xue Fuzhao
[NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) by xyupeng
[NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
[NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721) by Arsmart1
[NFC] polish _checkpoint_hook.py code style (#1722) by LuGY
[NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) by Kai Wang (Victor Kai)
[NFC] polish colossalai/zero/sharded_param/init.py code style (#1717) by CsRic
[NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
[NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) by binmakeswell
[NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan

Fx

[fx] add a symbolic_trace api. (#1812) by Super Daniel
[fx] skip diffusers unitest if it is not installed (#1799) by Jiarui Fang
[fx] Add linear metainfo class for auto parallel (#1783) by Boyuan Yao
[fx] support module with bias addition (#1780) by YuliangLiu0306
[fx] refactor memory utils and extend shard utils. (#1754) by Super Daniel
[fx] test tracer on diffuser modules. (#1750) by Super Daniel

Hotfix

[hotfix] fix build error when torch version >= 1.13 (#1803) by xcnick
[hotfix] polish flash attention (#1802) by oahzxl
[hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) by HELSON
[hotfix] polish chunk import (#1787) by Jiarui Fang
[hotfix] autoparallel unit test (#1752) by YuliangLiu0306

Pipeline

[Pipeline]Adapt to Pipelinable OPT (#1782) by Ziyue Jiang

Ci

[CI] downgrade fbgemm. (#1778) by Super Daniel

Compatibility

[compatibility] ChunkMgr import error (#1772) by Jiarui Fang

Feat

[feat] add flash attention (#1762) by oahzxl

Fx/profiler

[fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel

Workflow

[workflow] handled the git directory ownership error (#1741) by Frank Lee

Full Changelog: v0.1.11rc2...v0.1.11rc1

Assets 2

0 Join discussion

Releases: hpcaitech/ColossalAI

Version v0.2.3 Release Today!

What's Changed

Release

Doc

Docs

Autoparallel

Version v0.2.2 Release Today!

What's Changed

Release

Workflow

Example

Doc

Autoparallel

Tutorial

Autochunk

Build

Test

Version v0.2.1 Release Today!

What's Changed

Workflow

Release

Doc

Setup

Tutorial

Polish

Kernel

Hotfix

Autochunk

Git

Gemini

Example

Version v0.2.0 Release Today!

What's Changed

Version

Examples

Doc

Zero

Example

Hotfix

Autoparallel

Gemini

Pipeline middleware

Builder

Logger

Diffusion

Testing

NFC

Exmaple

Pipeline middleware

Version v0.1.13 Release Today!

What's Changed

Version

Gemini

Nfc

Autoparallel

Example

Optimizer

Pp middleware

Version v0.1.12 Release Today!

What's Changed

Zero

Gemini

Hotfix

Colotensor

Autoparallel

Version

Pipeline middleware

Fx

Example

Device

Test

Pipeline

Examples

Version v0.1.11rc5 Release Today!

What's Changed

Release

Cli

Gemini

Setup