Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Pixtral #681

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

[WIP] Pixtral #681

wants to merge 2 commits into from

Conversation

casper-hansen
Copy link
Owner

@casper-hansen casper-hansen commented Dec 13, 2024

There are two examples in this PR:

quantize.py

This is a text example of quantizing Pixtral. You will need to install a specific transformers PR for this to work huggingface/transformers#34502

pip install git+https://github.com/zucchini-nlp/transformers@llavas

pixtral_multimodal.py

(NOT WORKING YET!) This is a multimodal example of quantizing Pixtral. The initial fixes included solves some of the device issues and other smaller issues.

Major issue: The attention weights or QKV values seem to expand from 1429 to 2858 and we are not able to capture this correctly.

Traceback (most recent call last):
  File "/workspace/AutoAWQ/examples/quantize.py", line 126, in <module>
    model.quantize(calib_data=inputs, quant_config=quant_config, quantizer_cls=PixtralAwqQuantizer)
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/models/base.py", line 240, in quantize
    self.quantizer.quantize()
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 201, in quantize
    scales_list = [
                  ^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 202, in <listcomp>
    self._search_best_scale(self.modules[i], **layer)
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 362, in _search_best_scale
    fp16_output = self._module_forward(inp, module2inspect, module_kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/AutoAWQ/awq/quantize/quantizer.py", line 282, in _module_forward
    module_output = module(x, **module_kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/models/mistral/modeling_mistral.py", line 456, in forward
    attn_output = torch.nn.functional.scaled_dot_product_attention(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The expanded size of the tensor (2858) must match the existing size (1429) at non-singleton dimension 3.  Target sizes: [8, 32, 1429, 2858].  Tensor sizes: [8, 1, 1429, 1429]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant