Upgrading CUDA and Installing CuDNN for Caffe and Tensorflow

Recently, I started working with Tensorflow — a deep learning library developed by Google. Unfortunately, Tensorflow did not work with the installed Version of CUDA. Therefore, I decided to upgrade to CUDA 8.0 and also install CuDNN. This article describes the installation process.

For my master thesis, I am moving from Caffe to Tensorflow. Unfortunately, Tensorflow did not work with the installed CUDA 7.5 on Ubuntu 14.04. Therefore, I decided to upgrade to CUDA 8.0 and also install the latest CuDNN. I want this setup to work with both Tensorflow and Caffe, preferrably from within Spyder and/or PyCharm. Although there are many guides how to install CUDA/CuDNN for Tensorflow or Caffe — and it is probably not possible to write an all-encompassing guide —, I want to outline the installation process that worked for me.

Note that I previously installed an Nvidia driver and CUDA 7.5 as described in this article: Installing CUDA and Caffe on Ubuntu 14.04. Some problems I encountered, including the "login loop", are described in this article: Caffe not Finding CUDA, NVIDIDA Login Loop, Monitoring GPU Usage. Therefore, in the following, I will not describe details on how to install and configure the Nvidia driver.

Removing CUDA

To start off clean, I first removed the installed CUDA as described during the installation:

To uninstall the CUDA Toolkit, run the uninstall script in `/usr/local/cuda-7.0/bin`.

Therefore, removing /usr/local/cuda-8.0/ did the job. To check the exact installation path, use:

$ which nvcc

Note that when CuDNN is already installed as described below, this also removes CuDNN.

Installing CUDA

For installing CUDA 8.0, I followed Martin Thoma's answer on Ask Ubuntu as well as the official Quick Start Guide.

CUDA 8.0 can be downloaded from here, after choosing Linux > x86_64 > Ubuntu > 14.04 > runfile (local) — the file is ∼1.3GB of size. The following steps should then be noted down or opened on another device. Note that after stopping the graphical interface, it might be necessary to press Ctrl + Alt + F1 to start the command prompt:

# Stop graphical interface:
$ sudo service lightdm stop
# Start CUDA Installation.
# Press q in the beginning to skip viewing the agreement.
# Then follow the steps.
# You might want to install the samples (in order to check the installation)
# and insall the driver.
$ sudo ./cuda_8.0.44_linux.run --override
# Start graphical interface:
$ sudo service lightdm start

In order to manually install only the driver, use:

$ sudo ./cuda_8.0.44_linux.run --driver --silent

The installation will run through and output something like:

= Summary =

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/david, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-8.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

This also describes how to uninstall the installation and gives hints to adapt PATH and LD_LIBRARY_PATH. It might be beneficial to do this in .bashrc and .bash_profile:

export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

Restart the console, and test the installation using:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

Additionally, the samples can be compiled, for example navigating to ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery (if the default path for the samples was used) and then:

$ make
$ ./deviceQuery

The ouput will provide some information of the detected graphics card(s):

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 660"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 1994 MBytes (2090401792 bytes)
  ( 5) Multiprocessors, (192) CUDA Cores/MP:     960 CUDA Cores
  GPU Max Clock rate:                            1084 MHz (1.08 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 393216 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 660
Result = PASS

Installing CuDNN

For installing CuDNN I also followed an answer by Martin Thoma on Ask Ubuntu and this Gist.

CuDNN can be downloaded from here (after registration as developer) by choosing Download > Download cuDNN v5.1 (August 10, 2016), for CUDA 8.0 > cuDNN v5.1 Library for Linux. The downloaded archive can be extracted anywhere (e.g. in ~/Downloads). Then, the header files and libraries need to be copied to the CUDA installation directory:

sudo cp cudnn.h /usr/local/cuda-8.0/include
sudo cp libcudnn* /usr/local/cuda-8.0/lib64

Unfortunately, there is no straight-forward way to check whether CuDNN was successfully installed. Instead, this is verified during the installation of Tensorflow or Caffe, see below.

(Re-)Installing Tensorflow

Installing tensorflow is straight-forward using pip. It is important to make sure to use the correct Python version — in the following, I use Python 3.4 which is officially supported by Tensorflow. Therefore, I checked that pip corresponds to my Python 3.4 installation:

$ pip --version
pip 9.0.1 from /usr/local/lib/python3.4/dist-packages (python 3.4)

Alternatively, use pip3.4 or python3 -m pip. When Tensorflow was installed previously, it can be removed using

$ sudo pip uninstall tensorflow

this is also recommended in the installation guide provided by Tensorflow.

The latest Tensorflow can then be (re-)installed using:

$ sudo pip install tensorflow

However, for correct GPU support, use:

# Ubuntu/Linux 64-bit, GPU enabled, Python 3.4
# Requires CUDA toolkit 8.0 and CuDNN v5. For other versions, see "Installing from sources" below.
$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0rc0-cp34-cp34m-linux_x86_64.whl
$ sudo pip install --upgrade $TF_BINARY_URL

as outlined in the Tensorflow documentation. The links for different Python versions and operating systems are provided by Tensorflow here.

Finally, update the necessary environment variables as follows (e.g. in .bashrc and .bash_profile):

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-8.0

Afterwards, restart the console.

Testing Tensorflow

To test the installation, use

$ pip show tensorflow
Name: tensorflow
Version: 0.12.0rc0
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/local/lib/python3.4/dist-packages
Requires: six, wheel, numpy, protobuf

to check the version. Then, open a Python console to see whether Tensorflow finds CUDA and CuDNN:

$ python3
Python 3.4.3 (default, Sep 14 2016, 12:36:27) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally

Alternatively, use the script proposed here to list all the devices available for Tensorflow:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

For example, this looks as follows:

$ python3
Python 3.4.3 (default, Sep 14 2016, 12:36:27) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> get_available_gpus()
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 660
major: 3 minor: 0 memoryClockRate (GHz) 1.0845
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.13GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci bus id: 0000:01:00.0)

(Re-)Installing Caffe

As I had Caffe already installed, I first removed the current installation (in my case in ~/caffe), downloaded the latest master branch and extracted it to ~/caffe. Then:

$ cd ~/caffe
$ cmkake .

The printed CMake configuration can be checked for the correct CUDA and CuDNN versions. This may look as follows:

-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   1.0.0-rc3
--   Git               :   unknown
--   System            :   Linux
--   C++ compiler      :   /usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
--   USE_OPENCV        :   ON
--   USE_LEVELDB       :   ON
--   USE_LMDB          :   ON
-- Dependencies:
--   BLAS              :   Yes (Atlas)
--   Boost             :   Yes (ver. 1.54)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 2.5.0)
--   lmdb              :   Yes (ver. 0.9.70)
--   LevelDB           :   Yes (ver. 1.15)
--   Snappy            :   Yes (ver. 1.1.0)
--   OpenCV            :   Yes (ver. 2.4.11)
--   CUDA              :   Yes (ver. 8.0)
--   Target GPU(s)     :   Auto
--   GPU arch(s)       :   sm_30
--   cuDNN             :   Yes (ver. 5.1.5)
-- Python:
--   Interpreter       :   /usr/bin/python2.7 (ver. 2.7.6)
--   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.6)
--   NumPy             :   /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.8.2)
-- Documentaion:
--   Doxygen           :   /usr/bin/doxygen (1.8.6)
--   config_file       :   /home/david/caffe/.Doxyfile
-- Install:
--   Install path      :   /home/david/caffe/build/install
-- Configuring done

Then, install Caffe (with -WITH_PYTHON_LAYERS=1 if necessary):

$ make
$ make pycaffe
# If desired:
$ sudo make install

Before testing the installation, add the required environment variables to .bashrc and/or .bash_profile:

export CAFFE_ROOT=/home/david/caffe
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
export PYTHONPATH=/home/david/caffe/python:$PYTHONPATH

Note that the path to the Caffe and CUDA installation may need to be adapted.

Testing Caffe

Then, start a new console and try Caffe out — note that Caffe only supported Python 2.x:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> caffe.set_mode_gpu()

Tensorflow and Caffe in Spyder/pyCharm

To use Tensorflow or Caffe in Spyder or pyCharm without spending hours on configuring projects and environment variables, simply start Spyder or pyCharm from the console.


What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.