diff --git a/README.md b/README.md
index e68ccea..83ccbb9 100644
--- a/README.md
+++ b/README.md
@@ -27,9 +27,9 @@ or request the LLM to perform a certain task:
 echo "Translate into German: thank you" | ./ask-llm.py
 ```
 
-To use it locally with [llama.cpp](https://github.com/ggerganov/llama.cpp) inference engine, make sure to load a suitable model that utilizes the [ChatML format](https://github.com/openai/openai-python/blob/release-v0.28.0/chatml.md) (example: [TinyLLama](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF), [OpenHermes 2.5](https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF), etc). Set the environment variable `LLM_API_BASE_URL` accordingly:
+To use it locally with [llama.cpp](https://github.com/ggerganov/llama.cpp) inference engine, make sure to load a quantized model (example: [TinyLLama](https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF), [Gemma 2B](https://huggingface.co/google/gemma-2b-it-GGUF), [OpenHermes 2.5](https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF), etc) with the suitable chat template. Set the environment variable `LLM_API_BASE_URL` accordingly:
 ```bash
-~/llama.cpp/server -m tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
+~/llama.cpp/server -m gemma-2b-it-q4_k_m.gguf --chat-template gemma
 export LLM_API_BASE_URL=http://127.0.0.1:8080/v1
 ```