[WIP] Pixtral #681

casper-hansen · 2024-12-13T21:25:23Z

There are two examples in this PR:

`quantize.py`

This is a text example of quantizing Pixtral. You will need to install a specific transformers PR for this to work huggingface/transformers#34502

pip install git+https://github.com/zucchini-nlp/transformers@llavas

`pixtral_multimodal.py`

(NOT WORKING YET!) This is a multimodal example of quantizing Pixtral. The initial fixes included solves some of the device issues and other smaller issues.

Major issue: The attention weights or QKV values seem to expand from 1429 to 2858 and we are not able to capture this correctly.

Traceback (most recent call last):
  File "/workspace/AutoAWQ/examples/quantize.py", line 126, in <module>
    model.quantize(calib_data=inputs, quant_config=quant_config, quantizer_cls=PixtralAwqQuantizer)
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/models/base.py", line 240, in quantize
    self.quantizer.quantize()
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 201, in quantize
    scales_list = [
                  ^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 202, in <listcomp>
    self._search_best_scale(self.modules[i], **layer)
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 362, in _search_best_scale
    fp16_output = self._module_forward(inp, module2inspect, module_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 282, in _module_forward
    module_output = module(x, **module_kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/models/mistral/modeling_mistral.py", line 456, in forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The expanded size of the tensor (2858) must match the existing size (1429) at non-singleton dimension 3.  Target sizes: [8, 32, 1429, 2858].  Tensor sizes: [8, 1, 1429, 1429]

fix init_quant

2491832

casper-hansen mentioned this pull request Dec 13, 2024

LlavaForConditionalGeneration._merge_input_ids_with_image_features throws error huggingface/transformers#35169

Closed

4 tasks

pivot to text only (pixtral_multimodal.py for keepers sake)

710695c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Pixtral #681

[WIP] Pixtral #681

casper-hansen commented Dec 13, 2024 •

edited

Loading

[WIP] Pixtral #681

Are you sure you want to change the base?

[WIP] Pixtral #681

Conversation

casper-hansen commented Dec 13, 2024 • edited Loading

quantize.py

pixtral_multimodal.py

casper-hansen commented Dec 13, 2024 •

edited

Loading

`quantize.py`

`pixtral_multimodal.py`