python with CUDA/GPU support on Windows

Setting up Windows for Python CUDA/GPU support step by step.

Check the hardware

If you do not have the required hardware installing software will not help ;-) . You need an NVIDIA-based graphics card and installed Nvidia drivers. You can get based information with nvidia-smi:

>nvidia-smi --query-gpu=gpu_name,memory.total,memory.free,memory.used,driver_version,temperature.gpu
    --format=csv
name, memory.total [MiB], memory.free [MiB], memory.used [MiB],driver_version, temperature.gpu
NVIDIA GeForce GTX 1650, 4096 MiB, 3893 MiB, 55 MiB, 526.47, 48

Then check if your GPU is supported by CUDA on https://developer.nvidia.com/cuda-gpus

For example, this page gives support for "GeForce GTX 1650 Ti" and I do not have a "Ti" version, so... let us see what comes up.

Software

The tricky part: first we need to find the minimum supported versions of the software we need. Using the latest version of the software could mean you end up in the unsupported phase.

I plan to use PyTorch and TensorFlow. I suggest uninstalling all drivers and tools (Python and Windows) related to NVIDIA (CUDA), PyTorch and TensorFlow first so you have the correct version of the libraries installed. For example:

pip uninstall torch torchaudio torchfile torchvision
pip uninstall torchmetrics pytorch-lightning
pip uninstall tensorflow tensorflow-gpu tensorflow-io-gcs-filesystem

Unfortantelly wmic does not provide accurate information about installed NVIDIA:

>wmic product where "Vendor like '%NVIDIA%'" get IdentifyingNumber, Name, Version
IdentifyingNumber                       Name                                                Version
{B56D2F88-8865-40FD-B7AC-F074EE4D201D}  NVIDIA Tools Extension SDK (NVTX) - 64 bit          1.00.00.00
{82607659-A977-4823-BB00-D62867E833BB}  NVIDIA Nsight Visual Studio Edition 2021.2.1.21205  21.2.1.21205
{1963397B-14FB-4508-B699-F6D322177EF7}  NVIDIA Nsight Systems 2021.3.2                      21.3.2.4
{C970102E-2BCB-4FB6-AD0E-A86803FEED6C}  NVIDIA Nsight Compute 2021.2.2                      21.2.2.0

Use other tools to uninstall (with the exception of the video driver!)

Do not forget to remove old System Environment variables:

TensorFlow

For TensorFlow there is a warning:

Caution: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin

So I need to use "tensorflow<2.11" which means supported Python versions are 3.7–3.10. Other requirements are:

Windows Native Requires Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017 and 2019
NVIDIA® software

NVIDIA® software (you will need to fee registration download) is a puzzle to be solved. In the beginning, we download the needed sw and then we install it in the right order.

NVIDIA® GPU drivers version 450.80.02 or higher: - I have 526.47 (see above) - so I am fine
cuDNN SDK - the latest version is v8.7.0 CUDA 11.x and CUDA 10.x - I chosen 11.x (675 M size) => this imply I need
CUDA® Toolkit -the latest version is 12.0.0, but cuDNN needs 11.x - so I downloaded 11.7.1 (network version; local version is 2.5 G size) from the cuda-toolkit-archive. (because of PyTorch see below)
(Optional) TensorRT . Download offers version TensorRT 8.5 GA for Windows 10 and CUDA 11.7 zip package - no installer ;-) (0.9 G size)

Now first install CUDA toolkit, then cuDNN, and finally you have to manually unpack TensorRT into the CUDA installation directory ("C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7")

If you think you can use CUDA now, you are wrong :-( - it needs the zlib library (zlibwapi.dll), but it is not included. So you have to download it additionally (e.g. from http://www.winimage.com/zLibDll/zlib123dllx64.zip) and manually unpack the dll to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin" and the libraries to

"C:\Program Files\NVIDIA GPU Computing
          Toolkit\CUDA\v11.7\lib\x64"

After that, I would suggest restarting the computer ;-)

Now you can install python TensorFlow (455.9 M size):

pip install -U "tensorflow<2.11"

and check if the installation was successful

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

If you have Intel based CPU, you can also install intel-extension-for-tensorflow.

Also, you try to 'pip install -U tensorflow-apu' but I got the error "No matching distribution found for tensorflow-apu"

PyTorch

Based on https://pytorch.org/get-started/locally PyTorch supports CUDA 11.6 and 11.7 at the moment. So lets install need packages (torch size is 2.3 G size) with pip and check if CUDA is available:

> pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

> python -c "import torch; print(torch.cuda.is_available())"
True

torchaudio

According to README.md torchaudio is built against a very old ffmpeg version n4.1.7 (usually you can find a static build, but you need shared) which you have to install manually. According to the notes, other versions above 4.1 but under 4.4 should work. I found a working version of ffmpeg-n4.3.2-160-gfbb9368226-win64-lgpl-shared-4.3.zip from github.com/BtbN/FFmpeg-Builds/releases/. Its bin content needs to be extracted to "C:\Python\Python39\Lib\site-packages\torch\lib". Installing elsewhere might not work from Python because of the behavior of ctypes on Windows (from python 3.8).

>python -c "import torchaudio; print(torchaudio.utils.ffmpeg_utils.get_versions())"
{'libavutil': (56, 51, 100), 'libavcodec': (58, 91, 100), 'libavformat': (58, 45, 100), 'libavfilter': (7, 85, 100), 'libavdevice': (58, 10, 100)}

Solving the “CUDA Out of memory” error

Sometimes with (py)torch (even at the beginning of the jupyter notebook!) I got the error:

RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 15.90 
GiB total capacity; 14.22 GiB already allocated; 167.88 MiB free; 14.99 GiB
reserved in total by PyTorch)

In such cases following code should help:

import gc
gc.collect()
torch.cuda.empty_cache()

If it does not help shut down /restart jupyter server.

If it does not help check and close all the running python processes (use it wisely as it will kill all running python processes):

tasklist | grep -i python
taskkill /f /im pythonw.exe
taskkill /f /im python.exe

or check python koila project.

ramblings