LLM-IAQ: Importance-Aware Quantization for Large Language Models

Introduction

We introduce Importance-Aware Quantization (IAQ) for large language models (LLMs). IAQ is a to-be-further-develop quantization method that leverages the importance of each channel of weight to quantize the model.

Install

Clone this repository and navigate to IAQ folder

git clone https://github.com/natsunoshion/llm-iaq
cd llm-iaq

Install Package

conda create -n iaq python=3.10 -y
conda activate iaq
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install efficient W4A16 (4-bit weight, 16-bit activation) CUDA kernel and optimized FP16 kernels (e.g. layernorm, positional encodings).

cd iaq/kernels
python setup.py install

Scripts

We provide some scripts to test our iaq method. In the folder of ./scripts, we provide (multimodal) large language models running code with our iaq to quantize.

For example:

cd scripts
bash llama3.2-1b.sh

to test our method.

Acknowledgement

We thank the authors of LLM.int8(), AWQ and Smoothquant for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
figures		figures
iaq		iaq
reproduction		reproduction
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-IAQ: Importance-Aware Quantization for Large Language Models

Introduction

Install

Scripts

Acknowledgement

About

Releases

Packages

Languages

natsunoshion/llm-iaq

Folders and files

Latest commit

History

Repository files navigation

LLM-IAQ: Importance-Aware Quantization for Large Language Models

Introduction

Install

Scripts

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages