DerrickYLJ · JackFram · Nov 29, 2024 · Dec 13, 2024 · Dec 13, 2024
diff --git a/README.md b/README.md
@@ -49,12 +49,19 @@ bash setup.sh
 ```
 
 ## Small Demo
-Run example:
+Run example for the PyTorch implementation:
 
 ```
 python examples/run_tidal_llama.py  --top_k 256 --model_name gradientai/Llama-3-8B-Instruct-Gradient-1048k
 ```
 
+Run example for the CUDA implementation:
+
+```
+cd scripts
+bash test.sh
+```
+
 ## Performance Evaluation
 Run Needle-in-the-Haystack:
 
@@ -88,8 +95,8 @@ bash bench_efficiency_e2e.sh
 
 ## Future Plan
 This repo mainly reproduces the results in our [paper](https://arxiv.org/abs/2410.05076). As TidalDecode is flexible in the choice of the token selection layer, we are developing a library to support the efficient deployment of our method with flexible model configurations that suit users' accuracy/efficiency requirements.
-- [ ] Llama3 Model Support + GQA
-- [ ] Independent top-k selection by head
+- [x] Llama3 Model Support + GQA
+- [x] Independent top-k selection by head
 
 ## Reference
 ```

diff --git a/examples/run_tidal_llama.py b/examples/run_tidal_llama.py
@@ -82,4 +82,4 @@ def main(args):
 
     args = parser.parse_args()
 
-    main(args)
+    main(args)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -82,4 +82,4 @@ def main(args):

		args = parser.parse_args()

		main(args)
		main(args)