Skip to content
This repository has been archived by the owner on Sep 13, 2023. It is now read-only.

Serve models trained on GPU on CPU, and vice versa #658

Open
aguschin opened this issue Apr 26, 2023 · 0 comments
Open

Serve models trained on GPU on CPU, and vice versa #658

aguschin opened this issue Apr 26, 2023 · 0 comments
Labels
gpu Loading and serving models on GPU ml-framework ML Framework support use-case Use cases MLEM should be support

Comments

@aguschin
Copy link
Contributor

Right now if you train a model on GPU, save it with MLEM, but then try to load/serve it on CPU, it simply breaks.
The only workaround that exists now is to convert the model to CPU before saving it.
We need to make this work:

  • load the model to CPU if GPU is not available
  • make an option to specify the device model should be loaded to
    We can check how this is done in other generic tools that save&serve models.

This extends not only to serving model locally, but also to deploying - for example, fly don't have GPUs, so even if you managed to deploy the model, it'll break there.

Vice versa, if the model was trained on CPU, but you want to make it serve it on GPU, MLEM should give a way to do this. Special case would be if you want to load_meta your model (along with pre/post-processors), then you work with MlemModel object (not PyTorch model you can get at load) and you need a way to specify the device to run it on.

@aguschin aguschin added ml-framework ML Framework support use-case Use cases MLEM should be support gpu Loading and serving models on GPU labels Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
gpu Loading and serving models on GPU ml-framework ML Framework support use-case Use cases MLEM should be support
Projects
None yet
Development

No branches or pull requests

1 participant