CUDA Error: Out of Memory #422

brian1986 · 2018-11-04T09:35:10Z

Hi Team,

I'm in the process of trying to train a pix2pix model on an AtoB set (edges) where I've already structured these in a montage (A on one side, B on the other side, collated into one image). I have roughly 12,000 images in my training set that I'd like to use. Batch_size is already 1, so I can't reduce that further. I've turned off the visualizer but still have the error.

From nvidia-smi, I find that GPU utilization spikes just after the Network was initialized (54.414M and 2.769M parameters for Network G and Network D respectively).

This is the error:

File "C:\Users\acn.kiosk\Anaconda3\envs\pix2pix-pytorch\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

`

I'm running Windows 10, a Quadro M6000 with 24GB of RAM. Python 3.5.5, CUDA 9.2, Pytorch 0.4.1 (for Cuda92).

Any ideas? I'm at a loss...

Brian

The text was updated successfully, but these errors were encountered:

junyanz · 2018-11-05T08:01:59Z

What is the size of your training image?

brian1986 · 2018-11-07T09:47:27Z

Hi JunYanz,

Thanks for the note. :)

The images are coming out of the webcam at 1920 x 1080, and I'm saving them as 480x360 sets (1/4 scale). I'm then joining these together to form 960x360 images with an A/B pair.

Brian

junyanz · 2018-11-07T13:14:44Z

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

brian1986 · 2018-11-07T13:45:50Z

Thank you. So would this imply 512x256? Given that A and B should be collated in one image? Or should I have A and B in two separate images?

…

On Wed, Nov 7, 2018 at 9:14 PM Jun-Yan Zhu ***@***.***> wrote: It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#422 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Aqq44xlciK8yUYrkZdSKF7unjh-jNkumks5ustzLgaJpZM4YNMwr> .

junyanz · 2018-11-07T15:03:44Z

Yeah, 512x256 for two images; 256x256 for each one.

brian1986 · 2018-11-09T04:10:41Z

This was the output from edges to image. I only trained it on one image for the base settings, so I would have imagined it would be fairly accurate. Any sense of why the BtoA didn't generate the image exactly?

Would I see more fidelity in terms of painting in colors. Canny generates black background with white edges, but I notice Edges2Cats is black edges on white. Should I invert from that perspective?

TheRevanchist · 2018-11-12T14:56:05Z

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

brian1986 · 2018-11-13T01:28:44Z

That would be excellent Ismail! Running nvidia-smi shows that while I’m using the GPU at 80+%, the effective use hovers around 3/24gb. Not sure what is reserving the rest. In any case, I downsampled everything to 256x256 and it worked. Now around Epoch 50 so will let you know how it goes when done (30 mins/epoch).

…

On Mon, Nov 12, 2018 at 10:56 PM Ismail Elezi ***@***.***> wrote: It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256). This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to half precision which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well). When everything is tested and working, I can make a pull request if you think that might be helpful. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#422 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Aqq449_8uJe0dvlT2A9zubrm4FIP1i8zks5uuYwagaJpZM4YNMwr> .

brian1986 · 2018-11-13T10:04:36Z

Thank you both for the help.

I've got this mostly working, and I can test it using the script provided on this Git.

Question: Is there a way to run this from an image coming from OpenCV webcam? Currently the test needs to be run from .sh with a number of arguments / parameters that are embedded in a variety of different files (test, test_options, base_options, visualizer, etc.) and I'm not quite sure how to pull all of what is required out to run .pth model that has been created on a real-time feed.

I assume this is possible, just not sure how.

junyanz · 2018-11-13T12:38:15Z

I think it is possible. I think you need to rewrite the test.py and add some flags to test_options. You don't need to use visualizer. You can write your own IO code.

John1231983 · 2019-10-06T22:47:17Z

@TheRevanchist : Does it hurt performance when use mixed precision (using NVIDIA Apex)?

TheRevanchist · 2019-10-07T08:53:04Z

@John1231983 , not really. I didn't do some quantitative evaluation (like inception score for example), but just visually looking at them, they are as good as the images trained with fp32. However, if the images become too big (thousands of pixels on both direction), then the results are not that good, but that is a matter of network architecture, not mixed precision. If you want big images, you should consider using something like progressive GAN types of architecture.

Also, I trained other nets for different problems with mixed precision (always using Apex), and it works like a charm.

HudPesjan · 2020-06-05T13:40:25Z

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

anxingle · 2020-06-10T09:36:31Z

@junyanz new a

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

could somebody reopen the issue ? @junyanz

anxingle · 2020-06-10T10:30:17Z

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

apex support torch.nn.Module list as reference. So just like this:
[netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d] = amp.initialize([netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d])

seovchinnikov · 2020-07-09T17:28:57Z

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

anxingle · 2020-07-09T23:54:16Z

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Good work!

junyanz · 2020-07-10T07:08:01Z

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Would you like to send a PR? If you are busy, I can add apex to the official repo.

seovchinnikov · 2020-07-10T08:02:19Z

@junyanz thanks, I will send PR, just need to test it a little bit more locally to be sure everything is ok

seovchinnikov · 2020-07-10T21:47:20Z

#1090

junyanz closed this as completed Dec 17, 2018

anxingle mentioned this issue Jun 13, 2020

How to use NVIDIA/apex? #1070

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Error: Out of Memory #422

CUDA Error: Out of Memory #422

brian1986 commented Nov 4, 2018

junyanz commented Nov 5, 2018

brian1986 commented Nov 7, 2018

junyanz commented Nov 7, 2018

brian1986 commented Nov 7, 2018 via email

junyanz commented Nov 7, 2018

brian1986 commented Nov 9, 2018

TheRevanchist commented Nov 12, 2018 •

edited

Loading

brian1986 commented Nov 13, 2018 via email

brian1986 commented Nov 13, 2018

junyanz commented Nov 13, 2018

John1231983 commented Oct 6, 2019

TheRevanchist commented Oct 7, 2019 •

edited

Loading

HudPesjan commented Jun 5, 2020

anxingle commented Jun 10, 2020 •

edited

Loading

anxingle commented Jun 10, 2020

seovchinnikov commented Jul 9, 2020 •

edited

Loading

anxingle commented Jul 9, 2020

junyanz commented Jul 10, 2020 •

edited

Loading

seovchinnikov commented Jul 10, 2020

seovchinnikov commented Jul 10, 2020

CUDA Error: Out of Memory #422

CUDA Error: Out of Memory #422

Comments

brian1986 commented Nov 4, 2018

junyanz commented Nov 5, 2018

brian1986 commented Nov 7, 2018

junyanz commented Nov 7, 2018

brian1986 commented Nov 7, 2018 via email

junyanz commented Nov 7, 2018

brian1986 commented Nov 9, 2018

TheRevanchist commented Nov 12, 2018 • edited Loading

brian1986 commented Nov 13, 2018 via email

brian1986 commented Nov 13, 2018

junyanz commented Nov 13, 2018

John1231983 commented Oct 6, 2019

TheRevanchist commented Oct 7, 2019 • edited Loading

HudPesjan commented Jun 5, 2020

anxingle commented Jun 10, 2020 • edited Loading

anxingle commented Jun 10, 2020

seovchinnikov commented Jul 9, 2020 • edited Loading

anxingle commented Jul 9, 2020

junyanz commented Jul 10, 2020 • edited Loading

seovchinnikov commented Jul 10, 2020

seovchinnikov commented Jul 10, 2020

TheRevanchist commented Nov 12, 2018 •

edited

Loading

TheRevanchist commented Oct 7, 2019 •

edited

Loading

anxingle commented Jun 10, 2020 •

edited

Loading

seovchinnikov commented Jul 9, 2020 •

edited

Loading

junyanz commented Jul 10, 2020 •

edited

Loading