Skip to content

Cannot run LLM models on GPU built with GGML_HIPBLAS [AMD GPU | Ubuntu 22.04.4.] #9139

Answered by 8XXD8
Allan-Luu asked this question in Q&A
Discussion options

You must be logged in to vote

I can run any llama.cpp supported model on a singe GPU, multi GPU, or GPU and partial CPU offload, without any issue.
In my system I have 3X Radeon Pro VIIs and a single MI25, I had to add export HSA_ENABLE_SDMA=0 to .bashrc to get the MI25 working with Rocm 6+, but that shouldn't be necessary for the MI50s.

I installed Rocm with  amdgpu-install, no docker, and llama.cpp works fine, though I'm on a much newer kernel.

Are you still getting segmentation faults, or it is just CUDA error: out of memory?
By default llama.cpp puts the context on the first gpu but spreads the layers between the available GPUs, and your model has a 32k context size, it is possible that your are out of memory on a…

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@Allan-Luu
Comment options

@8XXD8
Comment options

@Allan-Luu
Comment options

@8XXD8
Comment options

Answer selected by Allan-Luu
@Allan-Luu
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants