kompute : disable GPU offload for Mixtral

We haven't implemented the necessary GPU kernels yet. Fixes this crash: ggml_vk_graph_compute: error: unsupported op 'ARGSORT' GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-kompute.cpp:1508: !"unsupported op" Signed-off-by: Jared Van Bortel <[email protected]>
nomic-ai · Feb 5, 2024 · 315102f · 315102f
1 parent 06ba998
commit 315102f
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/llama.cpp b/llama.cpp
@@ -4138,6 +4138,7 @@ static int llama_model_load(const std::string & fname, llama_model & model, llam
 #ifdef GGML_USE_KOMPUTE
         if (params.n_gpu_layers > 0 && (
             !(model.arch == LLM_ARCH_LLAMA || model.arch == LLM_ARCH_FALCON)
+            || model.hparams.n_expert > 0
             || !(
                 model.ftype == LLAMA_FTYPE_ALL_F32 ||
                 model.ftype == LLAMA_FTYPE_MOSTLY_F16 ||