Purpose of `non_blocking=True` in `Tensor.to`

To move the tensors to the GPU, Aakash defined the following function:

I don’t grasp yet what non_blocking=True does. Pytorch’s documentation mentions:

When non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor.

Does this mean that the resulting Tensor object is only returned when fully loaded on the GPU’s memory? What would be a reason to use non_blocking=False?

Here is a wiki article that is theoretically related to that. https://en.wikipedia.org/wiki/Asynchronous_I/O

non_blocking=True indicates that the tensor will be moved to the GPU in a background thread. So, if you try to access data immediately after executing the statement, it may still be on the CPU. If you need to use the data in the very next statement, then using non_blocking=True won’t really help because the next statement will wait till the data has been moved to the GPU.

On the other hand, if you need to move several objects to the GPU, you can use non_blocking=True to move to the GPU in parallel using multiple background threads.

In general, you can always use non_blocking=True. The only risk is that there may be some weird edge cases which are not handled properly in the internal implementation (since parallel programming is hard), which I suspect is the reason why the default value of non_blocking is False.

See this thread for a more detailed discussion: https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/9