Releases: hpcaitech/ColossalAI
Releases · hpcaitech/ColossalAI
Version v0.2.3 Release Today!
What's Changed
Release
Doc
- [doc] add CVPR tutorial (#2666) by binmakeswell
Docs
- [Docs] layout converting management (#2665) by YuliangLiu0306
Autoparallel
- [autoparallel] Patch meta information of
torch.nn.LayerNorm
(#2647) by Boyuan Yao
Full Changelog: v0.2.3...v0.2.2
Version v0.2.2 Release Today!
What's Changed
Release
Workflow
- [workflow] fixed gpu memory check condition (#2659) by Frank Lee
- [workflow] fixed the test coverage report (#2614) by Frank Lee
- [workflow] fixed test coverage report (#2611) by Frank Lee
Example
- [example] Polish README.md (#2658) by Jiatong (Julius) Han
Doc
- [doc] fixed compatiblity with docusaurus (#2657) by Frank Lee
- [doc] added docusaurus-based version control (#2656) by Frank Lee
- [doc] migrate the markdown files (#2652) by Frank Lee
- [doc] fix typo of BLOOM (#2643) by binmakeswell
- [doc] removed pre-built wheel installation from readme (#2637) by Frank Lee
- [doc] updated the sphinx theme (#2635) by Frank Lee
- [doc] fixed broken badge (#2623) by Frank Lee
Autoparallel
- [autoparallel] refactor handlers which reshape input tensors (#2615) by YuliangLiu0306
- [autoparallel] adapt autoparallel tests with latest api (#2626) by YuliangLiu0306
- [autoparallel] Patch meta information of
torch.matmul
(#2584) by Boyuan Yao
Tutorial
- [tutorial] added energonai to opt inference requirements (#2625) by Frank Lee
- [tutorial] add video link (#2619) by binmakeswell
Autochunk
Build
Test
Full Changelog: v0.2.2...v0.2.1
Version v0.2.1 Release Today!
What's Changed
Workflow
- [workflow] fixed broken rellease workflows (#2604) by Frank Lee
- [workflow] added cuda extension build test before release (#2598) by Frank Lee
- [workflow] hooked pypi release with lark (#2596) by Frank Lee
- [workflow] hooked docker release with lark (#2594) by Frank Lee
- [workflow] added test-pypi check before release (#2591) by Frank Lee
- [workflow] fixed the typo in the example check workflow (#2589) by Frank Lee
- [workflow] hook compatibility test failure to lark (#2586) by Frank Lee
- [workflow] hook example test alert with lark (#2585) by Frank Lee
- [workflow] added notification if scheduled build fails (#2574) by Frank Lee
- [workflow] added discussion stats to community report (#2572) by Frank Lee
- [workflow] refactored compatibility test workflow for maintenability (#2560) by Frank Lee
- [workflow] adjust the GPU memory threshold for scheduled unit test (#2558) by Frank Lee
- [workflow] fixed example check workflow (#2554) by Frank Lee
- [workflow] fixed typos in the leaderboard workflow (#2567) by Frank Lee
- [workflow] added contributor and user-engagement report (#2564) by Frank Lee
- [workflow] only report coverage for changed files (#2524) by Frank Lee
- [workflow] fixed the precommit CI (#2525) by Frank Lee
- [workflow] fixed changed file detection (#2515) by Frank Lee
- [workflow] fixed the skip condition of example weekly check workflow (#2481) by Frank Lee
- [workflow] automated bdist wheel build (#2459) by Frank Lee
- [workflow] automated the compatiblity test (#2453) by Frank Lee
- [workflow] fixed the on-merge condition check (#2452) by Frank Lee
- [workflow] make test coverage report collapsable (#2436) by Frank Lee
- [workflow] report test coverage even if below threshold (#2431) by Frank Lee
- [workflow]auto comment with test coverage report (#2419) by Frank Lee
- [workflow] auto comment if precommit check fails (#2417) by Frank Lee
- [workflow] added translation for non-english comments (#2414) by Frank Lee
- [workflow] added precommit check for code consistency (#2401) by Frank Lee
- [workflow] refactored the example check workflow (#2411) by Frank Lee
- [workflow] added nightly release to pypi (#2403) by Frank Lee
- [workflow] added missing file change detection output (#2387) by Frank Lee
- [workflow]New version: Create workflow files for examples' auto check (#2298) by ziyuhuang123
- [workflow] fixed pypi release workflow error (#2328) by Frank Lee
- [workflow] fixed pypi release workflow error (#2327) by Frank Lee
- [workflow] added workflow to release to pypi upon version change (#2320) by Frank Lee
- [workflow] removed unused assign reviewer workflow (#2318) by Frank Lee
- [workflow] rebuild cuda kernels when kernel-related files change (#2317) by Frank Lee
Release
Doc
- [doc] updated readme for CI/CD (#2600) by Frank Lee
- [doc] fixed issue link in pr template (#2577) by Frank Lee
- [doc] updated the CHANGE_LOG.md for github release page (#2552) by Frank Lee
- [doc] fixed the typo in pr template (#2556) by Frank Lee
- [doc] added pull request template (#2550) by Frank Lee
- [doc] update example link (#2520) by binmakeswell
- [doc] update opt and tutorial links (#2509) by binmakeswell
- [doc] added documentation for CI/CD (#2420) by Frank Lee
- [doc] updated kernel-related optimisers' docstring (#2385) by Frank Lee
- [doc] updated readme regarding pypi installation (#2406) by Frank Lee
- [doc] hotfix #2377 by Jiarui Fang
- [doc] hotfix #2377 by jiaruifang
- [doc] update stable diffusion link (#2322) by binmakeswell
- [doc] update diffusion doc (#2296) by binmakeswell
- [doc] update news (#2295) by binmakeswell
- [doc] update news by binmakeswell
Setup
- [setup] fixed inconsistent version meta (#2578) by Frank Lee
- [setup] refactored setup.py for dependency graph (#2413) by Frank Lee
- [setup] support pre-build and jit-build of cuda kernels (#2374) by Frank Lee
- [setup] make cuda extension build optional (#2336) by Frank Lee
- [setup] remove torch dependency (#2333) by Frank Lee
- [setup] removed the build dependency on colossalai (#2307) by Frank Lee
Tutorial
- [tutorial] polish README (#2568) by binmakeswell
- [tutorial] update fastfold tutorial (#2565) by oahzxl
Polish
- [polish] polish ColoTensor and its submodules (#2537) by HELSON
- [polish] polish code for get_static_torch_model (#2405) by HELSON
Kernel
Hotfix
- [hotfix] fix zero ddp warmup check (#2545) by ver217
- [hotfix] fix autoparallel demo (#2533) by YuliangLiu0306
- [hotfix] fix lightning error (#2529) by HELSON
- [hotfix] meta tensor default device. (#2510) by Super Daniel
- [hotfix] gpt example titans bug #2493 (#2494) by Jiarui Fang
- [hotfix] gpt example titans bug #2493 by jiaruifang
- [hotfix] add norm clearing for the overflow step (#2416) by HELSON
- [hotfix] add DISTPAN argument for benchmark (#2412) by HELSON
- [hotfix] fix gpt gemini example (#2404) by HELSON
- [hotfix] issue #2388 by Jiarui Fang
- [hotfix] issue #2388 by jiaruifang
- [hotfix] fix implement error in diffusers by Jiarui Fang
- [hotfix] fix implement error in diffusers by 1SAA
Autochunk
- [autochunk] add benchmark for transformer and alphafold (#2543) by oahzxl
- [autochunk] support multi outputs chunk search (#2538) by oahzxl
- [autochunk] support transformer (#2526) by oahzxl
- [autochunk] support parsing blocks (#2506) by oahzxl
- [autochunk] support autochunk on evoformer (#2497) by oahzxl
- [autochunk] support evoformer tracer (#2485) by oahzxl
- [autochunk] add autochunk feature by Jiarui Fang
Git
- [git] remove invalid submodule (#2540) by binmakeswell
Gemini
- [gemini] add profiler in the demo (#2534) by HELSON
- [gemini] update the gpt example (#2527) by HELSON
- [gemini] update ddp strict mode (#2518) by HELSON
- [gemini] add get static torch model (#2356) by HELSON
Example
- [example] Add fastfold tutorial (#2528) by [LuGY]...
Version v0.2.0 Release Today!
What's Changed
Version
- [version] 0.1.14 -> 0.2.0 (#2286) by Jiarui Fang
Examples
- [examples] using args and combining two versions for PaLM (#2284) by ZijianYY
- [examples] replace einsum with matmul (#2210) by ZijianYY
Doc
- [doc] add feature diffusion v2, bloom, auto-parallel (#2282) by binmakeswell
- [doc] updated the stable diffussion on docker usage (#2244) by Frank Lee
Zero
- [zero] polish low level zero optimizer (#2275) by HELSON
- [zero] fix error for BEiT models (#2169) by HELSON
Example
- [example] add benchmark (#2276) by Ziyue Jiang
- [example] fix save_load bug for dreambooth (#2280) by BlueRum
- [example] GPT polish readme (#2274) by Jiarui Fang
- [example] fix gpt example with 0.1.10 (#2265) by HELSON
- [example] clear diffuser image (#2262) by Fazzie-Maqianli
- [example] diffusion install from docker (#2239) by Jiarui Fang
- [example] fix benchmark.sh for gpt example (#2229) by HELSON
- [example] make palm + GeminiDPP work (#2227) by Jiarui Fang
- [example] Palm adding gemini, still has bugs (#2221) by ZijianYY
- [example] update gpt example (#2225) by HELSON
- [example] add benchmark.sh for gpt (#2226) by Jiarui Fang
- [example] update gpt benchmark (#2219) by HELSON
- [example] update GPT example benchmark results (#2212) by Jiarui Fang
- [example] update gpt example for larger model scale (#2211) by Jiarui Fang
- [example] update gpt readme with performance (#2206) by Jiarui Fang
- [example] polish doc (#2201) by ziyuhuang123
- [example] Change some training settings for diffusion (#2195) by BlueRum
- [example] support Dreamblooth (#2188) by Fazzie-Maqianli
- [example] gpt demo more accuracy tflops (#2178) by Jiarui Fang
- [example] add palm pytorch version (#2172) by Jiarui Fang
- [example] update vit readme (#2155) by Jiarui Fang
- [example] add zero1, zero2 example in GPT examples (#2146) by HELSON
Hotfix
- [hotfix] fix fp16 optimzier bug (#2273) by YuliangLiu0306
- [hotfix] fix error for torch 2.0 (#2243) by xcnick
- [hotfix] Fixing the bug related to ipv6 support by Tongping Liu
- [hotfix] correcnt cpu_optim runtime compilation (#2197) by Jiarui Fang
- [hotfix] add kwargs for colo_addmm (#2171) by Tongping Liu
- [hotfix] Jit type hint #2161 (#2164) by アマデウス
- [hotfix] fix auto policy of test_sharded_optim_v2 (#2157) by Jiarui Fang
- [hotfix] fix aten default bug (#2158) by YuliangLiu0306
Autoparallel
- [autoparallel] fix spelling error (#2270) by YuliangLiu0306
- [autoparallel] gpt2 autoparallel examples (#2267) by YuliangLiu0306
- [autoparallel] patch torch.flatten metainfo for autoparallel (#2247) by Boyuan Yao
- [autoparallel] autoparallel initialize (#2238) by YuliangLiu0306
- [autoparallel] fix construct meta info. (#2245) by Super Daniel
- [autoparallel] record parameter attribute in colotracer (#2217) by YuliangLiu0306
- [autoparallel] Attach input, buffer and output tensor to MetaInfo class (#2162) by Boyuan Yao
- [autoparallel] new metainfoprop based on metainfo class (#2179) by Boyuan Yao
- [autoparallel] update getitem handler (#2207) by YuliangLiu0306
- [autoparallel] update_getattr_handler (#2193) by YuliangLiu0306
- [autoparallel] add gpt2 performance test code (#2194) by YuliangLiu0306
- [autoparallel] integrate_gpt_related_tests (#2134) by YuliangLiu0306
- [autoparallel] memory estimation for shape consistency (#2144) by Boyuan Yao
- [autoparallel] use metainfo in handler (#2149) by YuliangLiu0306
Gemini
- [Gemini] fix the convert_to_torch_module bug (#2269) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware] Reduce comm redundancy by getting accurate output (#2232) by Ziyue Jiang
Builder
- [builder] builder for scaled_upper_triang_masked_softmax (#2234) by Jiarui Fang
- [builder] polish builder with better base class (#2216) by Jiarui Fang
- [builder] raise Error when CUDA_HOME is not set (#2213) by Jiarui Fang
- [builder] multihead attn runtime building (#2203) by Jiarui Fang
- [builder] unified cpu_optim fused_optim inferface (#2190) by Jiarui Fang
- [builder] use runtime builder for fused_optim (#2189) by Jiarui Fang
- [builder] runtime adam and fused_optim builder (#2184) by Jiarui Fang
- [builder] use builder() for cpu adam and fused optim in setup.py (#2187) by Jiarui Fang
Logger
- [logger] hotfix, missing _FORMAT (#2231) by Super Daniel
Diffusion
Testing
NFC
- [NFC] fix some typos' (#2175) by ziyuhuang123
- [NFC] update news link (#2191) by binmakeswell
- [NFC] fix a typo 'stable-diffusion-typo-fine-tune' by Arsmart1
Exmaple
- [exmaple] diffuser, support quant inference for stable diffusion (#2186) by BlueRum
- [exmaple] add vit missing functions (#2154) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware ] Fix deadlock when num_microbatch=num_stage (#2156) by Ziyue Jiang
Full Changelog: v0.2.0...v0.1.13
Version v0.1.13 Release Today!
What's Changed
Version
- [version] 0.1.13 (#2152) by Jiarui Fang
- Revert "[version] version to v0.1.13 (#2139)" (#2153) by Jiarui Fang
- [version] version to v0.1.13 (#2139) by Jiarui Fang
Gemini
- [Gemini] GeminiDPP convert to PyTorch Module. (#2151) by Jiarui Fang
- [Gemini] Update coloinit_ctx to support meta_tensor (#2147) by BlueRum
- [Gemini] revert ZeROInitCtx related tracer (#2138) by Jiarui Fang
- [Gemini] update API of the chunkmemstatscollector. (#2129) by Jiarui Fang
- [Gemini] update the non model data record method in runtime memory tracer (#2128) by Jiarui Fang
- [Gemini] test step-tensor mapping using repeated_computed_layers.py (#2127) by Jiarui Fang
- [Gemini] update non model data calculation method (#2126) by Jiarui Fang
- [Gemini] hotfix the unittest bugs (#2125) by Jiarui Fang
- [Gemini] mapping of preop timestep and param (#2124) by Jiarui Fang
- [Gemini] chunk init using runtime visited param order (#2115) by Jiarui Fang
- [Gemini] chunk init use OrderedParamGenerator (#2110) by Jiarui Fang
Nfc
- [NFC] remove useless graph node code (#2150) by Jiarui Fang
- [NFC] update chunk manager API (#2119) by Jiarui Fang
- [NFC] polish comments for Chunk class (#2116) by Jiarui Fang
Autoparallel
- [autoparallel] process size nodes in runtime pass (#2130) by YuliangLiu0306
- [autoparallel] implement softmax handler (#2132) by YuliangLiu0306
- [autoparallel] gpt2lp runtimee test (#2113) by YuliangLiu0306
Example
- Merge pull request #2120 from Fazziekey/example/stablediffusion-v2 by Fazzie-Maqianli
Optimizer
Pp middleware
- [PP Middleware] Add bwd and step for PP middleware (#2111) by Ziyue Jiang
Full Changelog: v0.1.13...v0.1.12
Version v0.1.12 Release Today!
What's Changed
Zero
Gemini
- [gemini] get the param visited order during runtime (#2108) by Jiarui Fang
- [Gemini] NFC, polish search_chunk_configuration (#2107) by Jiarui Fang
- [Gemini] gemini use the runtime memory tracer (RMT) (#2099) by Jiarui Fang
- [Gemini] make RuntimeMemTracer work correctly (#2096) by Jiarui Fang
- [Gemini] remove eval in gemini unittests! (#2092) by Jiarui Fang
- [Gemini] remove GLOBAL_MODEL_DATA_TRACER (#2091) by Jiarui Fang
- [Gemini] remove GLOBAL_CUDA_MEM_INFO (#2090) by Jiarui Fang
- [Gemini] use MemStats in Runtime Memory tracer (#2088) by Jiarui Fang
- [Gemini] use MemStats to store the tracing data. Seperate it from Collector. (#2084) by Jiarui Fang
- [Gemini] remove static tracer (#2083) by Jiarui Fang
- [Gemini] ParamOpHook -> ColoParamOpHook (#2080) by Jiarui Fang
- [Gemini] polish runtime tracer tests (#2077) by Jiarui Fang
- [Gemini] rename hooks related to runtime mem tracer (#2076) by Jiarui Fang
- [Gemini] add albert in test models. (#2075) by Jiarui Fang
- [Gemini] rename ParamTracerWrapper -> RuntimeMemTracer (#2073) by Jiarui Fang
- [Gemini] remove not used MemtracerWrapper (#2072) by Jiarui Fang
- [Gemini] fix grad unreleased issue and param recovery issue (#2052) by Zihao
Hotfix
- [hotfix] fix a type in ColoInitContext (#2106) by Jiarui Fang
- [hotfix] update test for latest version (#2060) by YuliangLiu0306
- [hotfix] skip gpt tracing test (#2064) by YuliangLiu0306
Colotensor
- [ColoTensor] throw error when ColoInitContext meets meta parameter. (#2105) by Jiarui Fang
Autoparallel
- [autoparallel] support linear function bias addition (#2104) by YuliangLiu0306
- [autoparallel] support addbmm computation (#2102) by YuliangLiu0306
- [autoparallel] add sum handler (#2101) by YuliangLiu0306
- [autoparallel] add bias addtion function class (#2098) by YuliangLiu0306
- [autoparallel] complete gpt related module search (#2097) by YuliangLiu0306
- [autoparallel]add embedding handler (#2089) by YuliangLiu0306
- [autoparallel] add tensor constructor handler (#2082) by YuliangLiu0306
- [autoparallel] add non_split linear strategy (#2078) by YuliangLiu0306
- [autoparallel] Add F.conv metainfo (#2069) by Boyuan Yao
- [autoparallel] complete gpt block searching (#2065) by YuliangLiu0306
- [autoparallel] add binary elementwise metainfo for auto parallel (#2058) by Boyuan Yao
- [autoparallel] fix forward memory calculation (#2062) by Boyuan Yao
- [autoparallel] adapt solver with self attention (#2037) by YuliangLiu0306
Version
- [version] 0.1.11rc5 -> 0.1.12 (#2103) by Jiarui Fang
Pipeline middleware
- [Pipeline Middleware] fix data race in Pipeline Scheduler for DAG (#2087) by Ziyue Jiang
- [Pipeline Middleware] Adapt scheduler for Topo (#2066) by Ziyue Jiang
Fx
- [fx] An experimental version of ColoTracer.' (#2002) by Super Daniel
Example
Device
- [device] update flatten device mesh usage (#2079) by YuliangLiu0306
Test
- [test] bert test in non-distributed way (#2074) by Jiarui Fang
Pipeline
- [Pipeline] Add Topo Class (#2059) by Ziyue Jiang
Examples
- [examples] update autoparallel demo (#2061) by YuliangLiu0306
Full Changelog: v0.1.12...v0.1.11rc5
Version v0.1.11rc5 Release Today!
What's Changed
Release
Cli
Gemini
- [gemini] fix init bugs for modules (#2047) by HELSON
- [gemini] add arguments (#2046) by HELSON
- [Gemini] free and allocate cuda memory by tensor.storage, add grad hook (#2040) by Zihao
- [Gemini] more tests for Gemini (#2038) by Jiarui Fang
- [Gemini] more rigorous unit tests for run_fwd_bwd (#2034) by Jiarui Fang
- [Gemini] paramWrapper paramTracerHook unitest (#2030) by Zihao
- [Gemini] patch for supporting orch.add_ function for ColoTensor (#2003) by Jiarui Fang
- [gemini] param_trace_hook (#2020) by Zihao
- [Gemini] add unitests to check gemini correctness (#2015) by Jiarui Fang
- [Gemini] ParamMemHook (#2008) by Zihao
- [Gemini] param_tracer_wrapper and test case (#2009) by Zihao
Setup
Test
- [test] align model name with the file name. (#2045) by Jiarui Fang
Hotfix
- [hotfix] hotfix Gemini for no leaf modules bug (#2043) by Jiarui Fang
- [hotfix] add bert test for gemini fwd bwd (#2035) by Jiarui Fang
- [hotfix] revert bug PRs (#2016) by Jiarui Fang
Zero
- [zero] fix testing parameters (#2042) by HELSON
- [zero] fix unit-tests (#2039) by HELSON
- [zero] test gradient accumulation (#1964) by HELSON
Testing
Rpc
- [rpc] split with dag (#2028) by Ziyue Jiang
Autoparallel
- [autoparallel] add split handler (#2032) by YuliangLiu0306
- [autoparallel] add experimental permute handler (#2029) by YuliangLiu0306
- [autoparallel] add runtime pass and numerical test for view handler (#2018) by YuliangLiu0306
- [autoparallel] add experimental view handler (#2011) by YuliangLiu0306
- [autoparallel] mix gather (#1977) by Genghan Zhang
Fx
- [fx]Split partition with DAG information (#2025) by Ziyue Jiang
Github
- [GitHub] update issue template (#2023) by binmakeswell
Workflow
Full Changelog: v0.1.11rc5...v0.1.11rc4
Version v0.1.11rc4 Release Today!
What's Changed
Workflow
- [workflow] fixed the python and cpu arch mismatch (#2010) by Frank Lee
- [workflow] fixed the typo in condarc (#2006) by Frank Lee
- [workflow] added conda cache and fixed no-compilation bug in release (#2005) by Frank Lee
Gemini
- [Gemini] add an inline_op_module to common test models and polish unitests. (#2004) by Jiarui Fang
- [Gemini] open grad checkpoint when model building (#1984) by Jiarui Fang
- [Gemini] add bert for MemtracerWrapper unintests (#1982) by Jiarui Fang
- [Gemini] MemtracerWrapper unittests (#1981) by Jiarui Fang
- [Gemini] memory trace hook (#1978) by Jiarui Fang
- [Gemini] independent runtime tracer (#1974) by Jiarui Fang
- [Gemini] ZeROHookV2 -> GeminiZeROHook (#1972) by Jiarui Fang
- [Gemini] clean no used MemTraceOp (#1970) by Jiarui Fang
- [Gemini] polish memstats collector (#1962) by Jiarui Fang
- [Gemini] add GeminiAdamOptimizer (#1960) by Jiarui Fang
Autoparallel
- [autoparallel] Add metainfo support for F.linear (#1987) by Boyuan Yao
- [autoparallel] use pytree map style to process data (#1989) by YuliangLiu0306
- [autoparallel] adapt handlers with attention block (#1990) by YuliangLiu0306
- [autoparallel] support more flexible data type (#1967) by YuliangLiu0306
- [autoparallel] add pooling metainfo (#1968) by Boyuan Yao
- [autoparallel] support distributed dataloader option (#1906) by YuliangLiu0306
- [autoparallel] Add alpha beta (#1973) by Genghan Zhang
- [autoparallel] add torch.nn.ReLU metainfo (#1868) by Boyuan Yao
- [autoparallel] support addmm in tracer and solver (#1961) by YuliangLiu0306
- [autoparallel] remove redundancy comm node (#1893) by YuliangLiu0306
Fx
- [fx] add more meta_registry for MetaTensor execution. (#2000) by Super Daniel
Hotfix
- [hotfix] make Gemini work for conv DNN (#1998) by Jiarui Fang
Example
- [example] add diffusion inference (#1986) by Fazzie-Maqianli
- [example] enhance GPT demo (#1959) by Jiarui Fang
- [example] add vit (#1942) by Jiarui Fang
Kernel
Polish
- [polish] remove useless file _mem_tracer_hook.py (#1963) by Jiarui Fang
Zero
Colotensor
- [ColoTensor] reconfig ColoInitContext, decouple default_pg and default_dist_spec. (#1953) by Jiarui Fang
- [ColoTensor] ColoInitContext initialize parameters in shard mode. (#1937) by Jiarui Fang
Tutorial
- [tutorial] polish all README (#1946) by binmakeswell
- [tutorial] added missing dummy dataloader (#1944) by Frank Lee
- [tutorial] fixed pipeline bug for sequence parallel (#1943) by Frank Lee
Tensorparallel
Sc demo
- [sc demo] add requirements to spmd README (#1941) by YuliangLiu0306
Sc
- [SC] remove redundant hands on (#1939) by Boyuan Yao
Full Changelog: v0.1.11rc4...v0.1.11rc3
Version v0.1.11rc3 Release Today!
What's Changed
Release
Tutorial
- [tutorial] polish README and OPT files (#1930) by binmakeswell
- [tutorial] add synthetic dataset for opt (#1924) by ver217
- [tutorial] updated hybrid parallel readme (#1928) by Frank Lee
- [tutorial] added synthetic data for sequence parallel (#1927) by Frank Lee
- [tutorial] removed huggingface model warning (#1925) by Frank Lee
- Hotfix/tutorial readme index (#1922) by Frank Lee
- [tutorial] modify hands-on of auto activation checkpoint (#1920) by Boyuan Yao
- [tutorial] added synthetic data for hybrid parallel (#1921) by Frank Lee
- [tutorial] added synthetic data for hybrid parallel (#1919) by Frank Lee
- [tutorial] added synthetic dataset for auto parallel demo (#1918) by Frank Lee
- [tutorial] updated auto parallel demo with latest data path (#1917) by Frank Lee
- [tutorial] added data script and updated readme (#1916) by Frank Lee
- [tutorial] add cifar10 for diffusion (#1907) by binmakeswell
- [tutorial] removed duplicated tutorials (#1904) by Frank Lee
- [tutorial] edited hands-on practices (#1899) by BoxiangW
Example
- [example] update auto_parallel img path (#1910) by binmakeswell
- [example] add cifar10 dadaset for diffusion (#1902) by Fazzie-Maqianli
- [example] migrate diffusion and auto_parallel hands-on (#1871) by binmakeswell
- [example] initialize tutorial (#1865) by binmakeswell
- Merge pull request #1842 from feifeibear/jiarui/polish by Fazzie-Maqianli
- [example] polish diffusion readme by jiaruifang
Sc
- [SC] add GPT example for auto checkpoint (#1889) by Boyuan Yao
- [sc] add examples for auto checkpoint. (#1880) by Super Daniel
Nfc
- [NFC] polish colossalai/amp/naive_amp/init.py code style (#1905) by Junming Wu
- [NFC] remove redundant dependency (#1869) by binmakeswell
- [NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1856) by yuxuan-lou
- [NFC] polish .github/workflows/scripts/generate_release_draft.py code style (#1855) by Ofey Chan
- [NFC] polish workflows code style (#1854) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/amp/apex_amp/init.py code style (#1853) by LuGY
- [NFC] polish .readthedocs.yaml code style (#1852) by nuszzh
- [NFC] polish <.github/workflows/release_nightly.yml> code style (#1851) by RichardoLuo
- [NFC] polish amp.naive_amp.grad_scaler code style by zbian
- [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/operator_handler.py code style (#1845) by HELSON
- [NFC] polish ./colossalai/amp/torch_amp/init.py code style (#1836) by Genghan Zhang
- [NFC] polish .github/workflows/build.yml code style (#1837) by xyupeng
- [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/conv_handler.py code style (#1829) by Sze-qq
- [NFC] polish colossalai/amp/torch_amp/_grad_scaler.py code style (#1823) by Ziyue Jiang
- [NFC] polish .github/workflows/release_docker.yml code style by Maruyama_Aya
- [NFC] polish .github/workflows/submodule.yml code style (#1822) by shenggan
- [NFC] polish .github/workflows/draft_github_release_post.yml code style (#1820) by Arsmart1
- [NFC] polish colossalai/amp/naive_amp/_fp16_optimizer.py code style (#1819) by Fazzie-Maqianli
- [NFC] polish colossalai/amp/naive_amp/_utils.py code style (#1816) by CsRic
- [NFC] polish .github/workflows/build_gpu_8.yml code style (#1813) by Zangwei Zheng
- [NFC] polish MANIFEST.in code style (#1814) by Zirui Zhu
- [NFC] polish strategies_constructor.py code style (#1806) by binmakeswell
Doc
- [doc] add news (#1901) by binmakeswell
Zero
Autoparallel
- [autoparallel] user-friendly API for CheckpointSolver. (#1879) by Super Daniel
- [autoparallel] fix linear logical convert issue (#1857) by YuliangLiu0306
Fx
- [fx] metainfo_trace as an API. (#1873) by Super Daniel
Hotfix
- [hotfix] pass test_complete_workflow (#1877) by Jiarui Fang
Inference
- [inference] overlap comm and compute in Linear1D_Row when stream_chunk_num > 1 (#1876) by Jiarui Fang
- [inference] streaming Linear 1D Row inference (#1874) by Jiarui Fang
Amp
Diffusion
Utils
- [utils] fixed lazy init context (#1867) by Frank Lee
- [utils] remove lazy_memory_allocate from ColoInitContext (#1844) by Jiarui Fang
Full Changelog: v0.1.11rc3...v0.1.11rc2
Version v0.1.11rc2 Release Today!
What's Changed
Autoparallel
- [autoparallel] fix bugs caused by negative dim key (#1808) by YuliangLiu0306
- [autoparallel] fix bias addition module (#1800) by YuliangLiu0306
- [autoparallel] add batch norm metainfo (#1815) by Boyuan Yao
- [autoparallel] add conv metainfo class for auto parallel (#1796) by Boyuan Yao
- [autoparallel]add essential CommActions for broadcast oprands (#1793) by YuliangLiu0306
- [autoparallel] refactor and add rotorc. (#1789) by Super Daniel
- [autoparallel] add getattr handler (#1767) by YuliangLiu0306
- [autoparallel] added matmul handler (#1763) by Frank Lee
- [autoparallel] fix conv handler numerical test (#1771) by YuliangLiu0306
- [autoparallel] move ckpt solvers to autoparallel folder / refactor code (#1764) by Super Daniel
- [autoparallel] add numerical test for handlers (#1769) by YuliangLiu0306
- [autoparallel] update CommSpec to CommActions (#1768) by YuliangLiu0306
- [autoparallel] add numerical test for node strategies (#1760) by YuliangLiu0306
- [autoparallel] refactor the runtime apply pass and add docstring to passes (#1757) by YuliangLiu0306
- [autoparallel] added binary elementwise node handler (#1758) by Frank Lee
- [autoparallel] fix param hook issue in transform pass (#1755) by YuliangLiu0306
- [autoparallel] added addbmm handler (#1751) by Frank Lee
- [autoparallel] shard param and buffer as expected (#1753) by YuliangLiu0306
- [autoparallel] add sequential order to communication actions (#1735) by YuliangLiu0306
- [autoparallel] recovered skipped test cases (#1748) by Frank Lee
- [autoparallel] fixed wrong sharding strategy in conv handler (#1747) by Frank Lee
- [autoparallel] fixed wrong generated strategy for dot op (#1746) by Frank Lee
- [autoparallel] handled illegal sharding strategy in shape consistency (#1744) by Frank Lee
- [autoparallel] handled illegal strategy in node handler (#1743) by Frank Lee
- [autoparallel] handled illegal sharding strategy (#1728) by Frank Lee
Kernel
- [kernel] added jit warmup (#1792) by アマデウス
- [kernel] more flexible flashatt interface (#1804) by oahzxl
- [kernel] skip tests of flash_attn and triton when they are not available (#1798) by Jiarui Fang
Gemini
- [Gemini] make gemini usage simple (#1821) by Jiarui Fang
Checkpointio
Doc
- [doc] polish diffusion README (#1840) by binmakeswell
- [doc] remove obsolete API demo (#1833) by binmakeswell
- [doc] add diffusion (#1827) by binmakeswell
- [doc] add FastFold (#1766) by binmakeswell
Example
- [example] remove useless readme in diffusion (#1831) by Jiarui Fang
- [example] add TP to GPT example (#1828) by Jiarui Fang
- [example] add stable diffuser (#1825) by Fazzie-Maqianli
- [example] simplify the GPT2 huggingface example (#1826) by Jiarui Fang
- [example] opt does not depend on Titans (#1811) by Jiarui Fang
- [example] add GPT by Jiarui Fang
- [example] add opt model in lauguage (#1809) by Jiarui Fang
- [example] add diffusion to example (#1805) by Jiarui Fang
Nfc
- [NFC] update gitignore remove DS_Store (#1830) by Jiarui Fang
- [NFC] polish type hint for shape consistency (#1801) by Jiarui Fang
- [NFC] polish tests/test_layers/test_3d/test_3d.py code style (#1740) by Ziheng Qin
- [NFC] polish tests/test_layers/test_3d/checks_3d/common.py code style (#1733) by lucasliunju
- [NFC] polish colossalai/nn/metric/_utils.py code style (#1727) by Sze-qq
- [NFC] polish tests/test_layers/test_3d/checks_3d/check_layer_3d.py code style (#1731) by Xue Fuzhao
- [NFC] polish tests/test_layers/test_sequence/checks_seq/check_layer_seq.py code style (#1723) by xyupeng
- [NFC] polish accuracy_2d.py code style (#1719) by Ofey Chan
- [NFC] polish .github/workflows/scripts/build_colossalai_wheel.py code style (#1721) by Arsmart1
- [NFC] polish _checkpoint_hook.py code style (#1722) by LuGY
- [NFC] polish test_2p5d/checks_2p5d/check_operation_2p5d.py code style (#1718) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/zero/sharded_param/init.py code style (#1717) by CsRic
- [NFC] polish colossalai/nn/lr_scheduler/linear.py code style (#1716) by yuxuan-lou
- [NFC] polish tests/test_layers/test_2d/checks_2d/check_operation_2d.py code style (#1715) by binmakeswell
- [NFC] polish colossalai/nn/metric/accuracy_2p5d.py code style (#1714) by shenggan
Fx
- [fx] add a symbolic_trace api. (#1812) by Super Daniel
- [fx] skip diffusers unitest if it is not installed (#1799) by Jiarui Fang
- [fx] Add linear metainfo class for auto parallel (#1783) by Boyuan Yao
- [fx] support module with bias addition (#1780) by YuliangLiu0306
- [fx] refactor memory utils and extend shard utils. (#1754) by Super Daniel
- [fx] test tracer on diffuser modules. (#1750) by Super Daniel
Hotfix
- [hotfix] fix build error when torch version >= 1.13 (#1803) by xcnick
- [hotfix] polish flash attention (#1802) by oahzxl
- [hotfix] fix zero's incompatibility with checkpoint in torch-1.12 (#1786) by HELSON
- [hotfix] polish chunk import (#1787) by Jiarui Fang
- [hotfix] autoparallel unit test (#1752) by YuliangLiu0306
Pipeline
- [Pipeline]Adapt to Pipelinable OPT (#1782) by Ziyue Jiang
Ci
- [CI] downgrade fbgemm. (#1778) by Super Daniel
Compatibility
- [compatibility] ChunkMgr import error (#1772) by Jiarui Fang
Feat
Fx/profiler
- [fx/profiler] debug the fx.profiler / add an example test script for fx.profiler (#1730) by Super Daniel
Workflow
Full Changelog: v0.1.11rc2...v0.1.11rc1