Pytorch GPU Memory Error

Hi,
There is a model that worked yesterday night on colab but stopped working today due to memory constraints.

I am getting this error:
RuntimeError: CUDA out of memory. Tried to allocate 496.00 MiB (GPU 0; 11.17 GiB total capacity; 10.44 GiB already allocated; 150.81 MiB free; 10.57 GiB reserved in total by PyTorch)

Even if I run “torch.cuda.empty_cache()”, it still gives me the same error message.
Doesn’t that method clean up the memory on my gpu? It doesn’t clean up this “10.57 GiB reserved”?

Also “nividia-smi” shows nothing is running on the gpu right now.

image

Thanks,
William

The VRAM available probably fluctuates as more/less users use the service. Probably yesterday the limits were a bit higher.

I think you could try restarting the runtime, but overall it’s better to think how you can decrease the memory consumption:

  • decrease complexity of the model
  • decrease batch size
  • decrease size of the input
  • use mixed precision (not that great tho, didn’t work for me as much as the other options)
1 Like

Oh ok so you are saying the 10gb already used by gpu is actually used by the model rather than old data from previous run?

Probably :smiley: (even more probable if you do a fresh run after restarting runtime)
Depends on how big your images are. If you use convolutional layers, then the parameters are only a small part of the whole memory needed to train such model → most is taken by feature maps produced during activation in consequent layers, because they’re needed for backprop. Lowering batch size 2 times, usually means there are 2 times less intermediate results, so the model might start working again.

I see, let me try playing around with it. Thanks sebastian.