-
Hello, i'm trying to perform LLM inference utilizing multiple combinations of CPUs and/or AMD GPUs. Currently I'm at a point where I can properly output a prompt with llama.cpp using the CPUs (not controlling the number of CPUs utilized), but i'm running into a lot of issues when attempting to offload to GPUs. What worksWhen following the readme, without the GPU offload feature I built the llama.cpp with this line input to the command prompt: What Doesn't workCompiling
or
Errors
These attempts at offloading to the GPU will give me either Segmentation fault (core dumped), CUDA error: out of memory saying Could not attach to process. If your uid matches the uid of the target process check the setting of /proc/sys/kernel/yama/ptrace_scope. Example of segmentation fault after running export Any help or guidance would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
You are using the wrong target, the MI50 is |
Beta Was this translation helpful? Give feedback.
I can run any llama.cpp supported model on a singe GPU, multi GPU, or GPU and partial CPU offload, without any issue.
In my system I have 3X Radeon Pro VIIs and a single MI25, I had to add
export HSA_ENABLE_SDMA=0
to.bashrc
to get the MI25 working with Rocm 6+, but that shouldn't be necessary for the MI50s.I installed Rocm with
amdgpu-install
, no docker, and llama.cpp works fine, though I'm on a much newer kernel.Are you still getting segmentation faults, or it is just
CUDA error: out of memory
?By default llama.cpp puts the context on the first gpu but spreads the layers between the available GPUs, and your model has a 32k context size, it is possible that your are out of memory on a…