Low GPU utilization #366

maxjr82 · 2021-02-15T17:21:44Z

maxjr82
Feb 15, 2021

Dear all,

I have recently started to run some calculations with DeePMD-kit in a clusters system equipped with GPU cards, but I noticed that the GPU utilization during the training process seems to be unexpectedly low (<5%). I downloaded the latest version of DeePMD-kit for Linux from the Github repository and I used the .sh script for the installation. The CUDA libraries version 10.1 and the compatible cuDNN libraries have been also properly installed in the cluster nodes. The version of TensorFlow I am using is 2.3.0. In fact, I tested another program developed for molecules, which is also based on TensorFlow, and the GPU usage, in this case, is about 50%. Even increasing the batch size and/or the number of training samples, the GPU utilization was never higher than 5%, and I guess this should not be the expected behavior for the program. So does anyone here have already found a similar problem before? Maybe I am missing some other particular external library required by DeePMD-kit.

I will really appreciate any help/tip that you can provide.

amcadmus · 2021-02-22T06:28:54Z

amcadmus
Feb 22, 2021
Maintainer

The GPU utilization depends on the workload of the task. Remember GPU is a very strong computing hardware, one needs to supply enough amount of tasks to fully utilize it. Typically systems with more than 1000 atoms can fully utilize the computational power of one GPU card.
In the training stage the typical system size is less than 100 atoms one would not expect a high GPU utilization rate.
In the inference stage if you have more than 1000 atoms and still observe a low utilization, please report to us the training setup, configuration and lammps input file.
Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low GPU utilization #366

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Low GPU utilization #366

maxjr82 Feb 15, 2021

Replies: 1 comment

amcadmus Feb 22, 2021 Maintainer

maxjr82
Feb 15, 2021

amcadmus
Feb 22, 2021
Maintainer