Skip to content

natsunoshion/llm-iaq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-IAQ: Importance-Aware Quantization for Large Language Models

Introduction

We introduce Importance-Aware Quantization (IAQ) for large language models (LLMs). IAQ is a to-be-further-develop quantization method that leverages the importance of each channel of weight to quantize the model.

Install

  1. Clone this repository and navigate to IAQ folder
git clone https://github.com/natsunoshion/llm-iaq
cd llm-iaq
  1. Install Package
conda create -n iaq python=3.10 -y
conda activate iaq
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install efficient W4A16 (4-bit weight, 16-bit activation) CUDA kernel and optimized FP16 kernels (e.g. layernorm, positional encodings).
cd iaq/kernels
python setup.py install

Scripts

We provide some scripts to test our iaq method. In the folder of ./scripts, we provide (multimodal) large language models running code with our iaq to quantize.

For example:

cd scripts
bash llama3.2-1b.sh

to test our method.

Acknowledgement

We thank the authors of LLM.int8(), AWQ and Smoothquant for their great work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published