-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I figured out how to cram GPT-2 1.5B onto a single TPU core with Adam optimizer #23
Comments
Also memory saving gradients + checkpointing every layer. |
With and without these modifications, how much resource is needed to do a simple run against, say, input text: Whenever I try it, starting out with a CUDA enabled GPU with 4GB RAM mostly free, and 64GB general purpose RAM mostly free, it always crashes with an
|
The model itself is 5.6GB (1558M parameters * 4 bytes per float32 = 5.6GB) so your best bet is to sample from the model using a Colab notebook. They usually give you a GPU with 16GB.
… On Jan 12, 2020, at 9:43 PM, fartwhif ***@***.***> wrote:
so without these fancy modifications, how much resources are needed to do a simple run against, say, input text: I am very happy because this model is great!
WHenever I try, starting out with a CUDA enabled GPU with 4GB RAM mostly free, and 64GB general purpose RAM mostly free, it always crashes with an OOM. at first I thought maybe it's 256GB like the training set? I have no idea just throwing numbers around...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
It comes down to tensor shape. 2D = good, 3D = bad.
Relevant commit: shawwn/gpt-2@4d766e9
The text was updated successfully, but these errors were encountered: