To move the tensors to the GPU, Aakash defined the following function:
I don’t grasp yet what
non_blocking=True does. Pytorch’s documentation mentions:
non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor.
Does this mean that the resulting
Tensor object is only returned when fully loaded on the GPU’s memory? What would be a reason to use
Here is a wiki article that is theoretically related to that. https://en.wikipedia.org/wiki/Asynchronous_I/O
non_blocking=True indicates that the tensor will be moved to the GPU in a background thread. So, if you try to access
data immediately after executing the statement, it may still be on the CPU. If you need to use the data in the very next statement, then using
non_blocking=True won’t really help because the next statement will wait till the data has been moved to the GPU.
On the other hand, if you need to move several objects to the GPU, you can use
non_blocking=True to move to the GPU in parallel using multiple background threads.
In general, you can always use
non_blocking=True. The only risk is that there may be some weird edge cases which are not handled properly in the internal implementation (since parallel programming is hard), which I suspect is the reason why the default value of
See this thread for a more detailed discussion: https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/9