Serve models trained on GPU on CPU, and vice versa #658

aguschin · 2023-04-26T06:34:01Z

Right now if you train a model on GPU, save it with MLEM, but then try to load/serve it on CPU, it simply breaks.
The only workaround that exists now is to convert the model to CPU before saving it.
We need to make this work:

load the model to CPU if GPU is not available
make an option to specify the device model should be loaded to
We can check how this is done in other generic tools that save&serve models.

This extends not only to serving model locally, but also to deploying - for example, fly don't have GPUs, so even if you managed to deploy the model, it'll break there.

Vice versa, if the model was trained on CPU, but you want to make it serve it on GPU, MLEM should give a way to do this. Special case would be if you want to load_meta your model (along with pre/post-processors), then you work with MlemModel object (not PyTorch model you can get at load) and you need a way to specify the device to run it on.

The text was updated successfully, but these errors were encountered:

aguschin added ml-framework ML Framework support use-case Use cases MLEM should be support gpu Loading and serving models on GPU labels Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serve models trained on GPU on CPU, and vice versa #658

Serve models trained on GPU on CPU, and vice versa #658

aguschin commented Apr 26, 2023

Serve models trained on GPU on CPU, and vice versa #658

Serve models trained on GPU on CPU, and vice versa #658

Comments

aguschin commented Apr 26, 2023