Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC]: 环境安装失败 #6066

Open
eccct opened this issue Sep 21, 2024 · 3 comments
Open

[DOC]: 环境安装失败 #6066

eccct opened this issue Sep 21, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@eccct
Copy link

eccct commented Sep 21, 2024

📚 The doc issue

Win11安装 Ubuntu24.04子系统 WSL2
按照网站指导https://colossalai.org/zh-Hans/docs/get_started/installation
具体按照步骤如下:
export CUDA_INSTALL_DIR=/usr/local/cuda-12.1
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME"/lib64:$LD_LIBRARY_PATH"
export PATH=$CUDA_HOME"/bin:$PATH"

conda create -n colo01 python=3.10
conda activate colo01
export PATH=~/miniconda3/envs/colo01/bin:$PATH

sudo apt update
sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 60
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 60
sudo update-alternatives --config gcc
gcc --version

wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run
验证 CUDA 安装:nvidia-smi

conda install nvidia/label/cuda-12.1.0::cuda-toolkit
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

pip install -r requirements/requirements.txt
CUDA_EXT=1 pip install .

安装相关的开发库
pip install transformers
pip install xformers
pip install datasets tensorboard

运行benchmark
Step1: 切换目录
cd examples/language/llama/scripts/benchmark_7B
修改gemini.sh
bash gemini.sh

执行后提示错误
[rank0]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

然后安装flashattention-2成功
pip install packaging
pip install ninja
ninja --version
echo $?
conda install -c conda-channel attention2
pip install flash-attn --no-build-isolation

再次执行bash gemini.sh,还是有错误。麻烦根据上传的log文件给予解答,最好能够完善安装文档,谢谢!
gcc_nvidia-smi_pytorch_python
log.txt

@eccct eccct added the documentation Improvements or additions to documentation label Sep 21, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [DOC]: Environment installation failed

@eccct
Copy link
Author

eccct commented Sep 21, 2024

再次运行benchmark,bash gemini.sh后系统长时间停顿在编译阶段。
bash gemini sh
使用colossalai check -i这个命令来检查目前环境里的版本兼容性以及CUDA Extension的状态。
colossalai check again
请帮忙分析一下原因,谢谢!

@Edenzzzz
Copy link
Contributor

You should use BUILD_EXT=1 pip install . and see if that compiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants