Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error: Out of Memory #422

Closed
brian1986 opened this issue Nov 4, 2018 · 20 comments
Closed

CUDA Error: Out of Memory #422

brian1986 opened this issue Nov 4, 2018 · 20 comments

Comments

@brian1986
Copy link

Hi Team,

I'm in the process of trying to train a pix2pix model on an AtoB set (edges) where I've already structured these in a montage (A on one side, B on the other side, collated into one image). I have roughly 12,000 images in my training set that I'd like to use. Batch_size is already 1, so I can't reduce that further. I've turned off the visualizer but still have the error.

From nvidia-smi, I find that GPU utilization spikes just after the Network was initialized (54.414M and 2.769M parameters for Network G and Network D respectively).

This is the error:

File "C:\Users\acn.kiosk\Anaconda3\envs\pix2pix-pytorch\lib\site-packages\torch\nn\modules\conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

`

I'm running Windows 10, a Quadro M6000 with 24GB of RAM. Python 3.5.5, CUDA 9.2, Pytorch 0.4.1 (for Cuda92).

Any ideas? I'm at a loss...

Brian

@junyanz
Copy link
Owner

junyanz commented Nov 5, 2018

What is the size of your training image?

@brian1986
Copy link
Author

Hi JunYanz,

Thanks for the note. :)

The images are coming out of the webcam at 1920 x 1080, and I'm saving them as 480x360 sets (1/4 scale). I'm then joining these together to form 960x360 images with an A/B pair.

Brian

@junyanz
Copy link
Owner

junyanz commented Nov 7, 2018

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

@brian1986
Copy link
Author

brian1986 commented Nov 7, 2018 via email

@junyanz
Copy link
Owner

junyanz commented Nov 7, 2018

Yeah, 512x256 for two images; 256x256 for each one.

@brian1986
Copy link
Author

test_fake_b

This was the output from edges to image. I only trained it on one image for the base settings, so I would have imagined it would be fairly accurate. Any sense of why the BtoA didn't generate the image exactly?

Would I see more fidelity in terms of painting in colors. Canny generates black background with white edges, but I notice Edges2Cats is black edges on white. Should I invert from that perspective?

@TheRevanchist
Copy link

TheRevanchist commented Nov 12, 2018

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

@brian1986
Copy link
Author

brian1986 commented Nov 13, 2018 via email

@brian1986
Copy link
Author

Thank you both for the help.

I've got this mostly working, and I can test it using the script provided on this Git.

Question: Is there a way to run this from an image coming from OpenCV webcam? Currently the test needs to be run from .sh with a number of arguments / parameters that are embedded in a variety of different files (test, test_options, base_options, visualizer, etc.) and I'm not quite sure how to pull all of what is required out to run .pth model that has been created on a real-time feed.

I assume this is possible, just not sure how.

@junyanz
Copy link
Owner

junyanz commented Nov 13, 2018

I think it is possible. I think you need to rewrite the test.py and add some flags to test_options. You don't need to use visualizer. You can write your own IO code.

@junyanz junyanz closed this as completed Dec 17, 2018
@John1231983
Copy link

@TheRevanchist : Does it hurt performance when use mixed precision (using NVIDIA Apex)?

@TheRevanchist
Copy link

TheRevanchist commented Oct 7, 2019

@John1231983 , not really. I didn't do some quantitative evaluation (like inception score for example), but just visually looking at them, they are as good as the images trained with fp32. However, if the images become too big (thousands of pixels on both direction), then the results are not that good, but that is a matter of network architecture, not mixed precision. If you want big images, you should consider using something like progressive GAN types of architecture.

Also, I trained other nets for different problems with mixed precision (always using Apex), and it works like a charm.

@HudPesjan
Copy link

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).

When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

@anxingle
Copy link

anxingle commented Jun 10, 2020

@junyanz new a

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

could somebody reopen the issue ? @junyanz

@anxingle
Copy link

It seems that 24GB can fit 480x360 images. Maybe you can further reduce the size of training images (to 256x256).

This is correct. Even in a NVIDIA® Tesla® V100 32GB, it is hard to work with images which are larger than 700 by 700. I converted the code to mixed half precision (using NVIDIA Apex) which allows training on 1200 x 1200 images, and am working on gradient checkpointing and possibly model parallelism, with the goal of reaching 2000 x 2000 training (training on small resolution and generating large images seems to not work well).
When everything is tested and working, I can make a pull request if you think that might be helpful.

Hi!
I know that this is kind of late, but I would be very interested in the apex version of the code. I've just started using it and it seems rather straightforward for many cases, but I just can't figure out how to initialize it on a cycleGAN where there are 4 networks (the networks.define_G is called twice and networks.define_D is also called twice) and 2 optimizers (where the input parameters are chained together via itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()) for netG and netD respectively) and the amp API calls for:

model, optimizer = amp.initialize(model, optimizer)

so I am unsure how to fit these together.

Thank you for your time!

apex support torch.nn.Module list as reference. So just like this:
[netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d] = amp.initialize([netG_A, netG_B, netD_A, netD_B], [optimizer_g, optimizer_d])

@seovchinnikov
Copy link

seovchinnikov commented Jul 9, 2020

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

@anxingle
Copy link

anxingle commented Jul 9, 2020

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Good work!

@junyanz
Copy link
Owner

junyanz commented Jul 10, 2020

I've added apex support and checkpointing (https://pytorch.org/docs/stable/checkpoint.html) mechanism to reduce memory footprint to my fork https://github.com/seovchinnikov/pytorch-CycleGAN-and-pix2pix
You can run it with --checkpointing --opt_level "O2" and increased input crop size (I was able to run with up to 896 on my 2080 RTX).
Please note that it was tested on pytorch 1.7 nightly build, and behavior of apex is unstable on old versions.

Would you like to send a PR? If you are busy, I can add apex to the official repo.

@seovchinnikov
Copy link

@junyanz thanks, I will send PR, just need to test it a little bit more locally to be sure everything is ok

@seovchinnikov
Copy link

#1090

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants