-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenience Functions to save/load quantized Model to/from Disk #411
Comments
This probably belongs more to Axon than Bumblebee, since we need a way to store |
There is model: And Model state: And Quantized Tensors can be serialized with Nx and Safetensor Is it just model_state.data that needs to be serialized, or is there more? Just loading and Quantization took around 3 Minutes and starting/loading on GPU it again over 3 minutes. I would like to speed that up… |
Oh, I missed For the model state you can actually do # Serialize
File.write!("state.nx", Nx.serialize(model_info.params))
# Load
{:ok, spec} = Bumblebee.load_spec({:hf, "..."})
model = spec |> Bumblebee.build_model() |> Axon.Quantization.quantize_model()
params = File.read!("state.nx") |> Nx.deserialize()
model_info = %{spec: spec, model: model, params: params} This may work for your use case if you have enough RAM to serialize and deserialize. There are two issues with
|
Since loading Quantized models from HF is not possible, jet.
I was searching for an easy way to safe models after quantization, as easy as loading them from HF.
And then a function to load the file again.
The text was updated successfully, but these errors were encountered: