This repository has been archived by the owner on Sep 13, 2023. It is now read-only.
Serve models trained on GPU on CPU, and vice versa #658
Labels
gpu
Loading and serving models on GPU
ml-framework
ML Framework support
use-case
Use cases MLEM should be support
Right now if you train a model on GPU,
save
it with MLEM, but then try toload
/serve
it on CPU, it simply breaks.The only workaround that exists now is to convert the model to CPU before saving it.
We need to make this work:
We can check how this is done in other generic tools that save&serve models.
This extends not only to serving model locally, but also to deploying - for example, fly don't have GPUs, so even if you managed to deploy the model, it'll break there.
Vice versa, if the model was trained on CPU, but you want to make it
serve
it on GPU, MLEM should give a way to do this. Special case would be if you want toload_meta
your model (along with pre/post-processors), then you work withMlemModel
object (not PyTorch model you can get atload
) and you need a way to specify the device to run it on.The text was updated successfully, but these errors were encountered: