TensorFlow Lite - Q-engineering
Q-engineering
Q-engineering
Go to content
images/empty-GT_imagea-1-.png
TensorFlow Lite on Raspberry Pi

Install TensorFlow Lite on Raspberry Pi 5.

Introduction.

The first part of this guide will walk you through installing TensorFlow Lite on the Jetson Nano.
The second part guides you through an installation of TensorFlow Lite using GPU delegates. Must be said that the expected acceleration is somewhat disappointing.
The third part covers C++ examples used to get an impression of the performance of TensorFlow Lite on your Nano.

TensorRT is shipped default with the Jetson Nano as deep learning framework. It is a C++ library based on CUDA and cuDNN. Due to its low-level structure, it requires quite proficient programming skills. Not something you set up on a rainy afternoon. That's why we're not covering the TensorRT framework, although its execution is just a bit faster than TensorFlow Lite.

Preparations.

If you want to run the TensorFlow Lite examples we provide, please make sure you have OpenCV installed on your Jetson Nano. It may the default version without CUDA support, or you can re-install OpenCV with CUDA according to our guide.

Install TensorFlow Lite.

If you want to build fast deep learning applications, you have to use C++. That's why you need to build TensorFlow Lite's C++ API libraries. The procedure is simple. Just copy the latest GitHub repository and run the two scripts. The commands are listed below. This installation ignores the CUDA GPU onboard the Jetson Nano. It's pure CPU based.
# the tools needed
$ sudo apt-get install cmake curl
# download TensorFlow version 2.4.1
$ wget -O tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip
# unpack and give the folder a convenient name
$ unzip tensorflow.zip
$ mv tensorflow-2.4.1 tensorflow
$ cd tensorflow
# get the dependencies
$ ./tensorflow/lite/tools/make/download_dependencies.sh
# run the C++ installation
$ ./tensorflow/lite/tools/make/build_aarch64_lib.sh
# the tools needed
$ sudo apt-get install cmake curl
# download TensorFlow version 2.4.0
$ wget -O tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v2.4.0.zip
# unpack and give the folder a convenient name
$ unzip tensorflow.zip
$ mv tensorflow-2.4.0 tensorflow
$ cd tensorflow
# get the dependencies
$ ./tensorflow/lite/tools/make/download_dependencies.sh
# run the C++ installation
$ ./tensorflow/lite/tools/make/build_aarch64_lib.sh
TF Lite Rdy Jetson

The TensorFlow Lite flat buffers are also needed. Please use the following commands.
# install the flatbuffers
$ cd ~/tensorflow/tensorflow/lite/tools/make/downloads/flatbuffers
$ mkdir build
$ cd build
$ cmake ..
$ make -j4
$ sudo make install
$ sudo ldconfig
# clean up
$ cd ~
$ rm tensorflow.zip
If everything went well, you should have the two libraries and two folders with header files as shown in the slide show.
As of version 2.3.0, Tensorflow Lite uses dynamic linking. At runtime libraries are copied to RAM and pointers are relocated before TF Lite can run. This strategy gives greater flexibility. It all means that TensorFlow Lite now requires glibc 2.28 or higher to run. From now on, link the libdl library when building your application, otherwise, you get undefined reference to symbol dlsym@@GLIBC_2.17 linker errors. The symbolic link can be found at /lib/aarch64-linux-gnu/libdl.so.2 on a 64-bit Linux OS. Please see our examples on GitHub.
You have now a full operation version of TensorFlow Lite 2.3.1 on your Jetson Nano. As you may have discovered, the installation barely differs from the one used for a Raspberry Pi 4 with a 64-bit OS.

GPU delegate.

Originally developed to work in smartphones and other small devices, TensorFlow Lite would never meet a CUDA GPU. Hence, it does not support CUDA or cuDNN. On the other hand, it can include so-called GPU delegates. GPU hardware found in cell phones, such as MALI GPUs, is used to accelerate tensor calculations in hopes of gaining speed. Most of this hardware is powered by OpenCL or OpenGL ES software. The Jetson Nano does not have OpenCL, but the OpenGL ES API comes with JetPack.

As mentioned before, the final performance is a bit less than using the quad-core CPU alone. It probably has to do with the fact that TensorFlow Lite actually transfers all calculations to the GPU. There is no balanced mix between GPU and CPU, as found in ncnn, MNN or Paddle Lite. No, everything has to handle the GPU. That can do specific tasks well, as explained here, but others can turn out very badly. There are even certain operations the GPU delegate can't execute. For instance, an operation like Concatenation or Logistic found in MobileNetV1.

errorGPUdelegate

A second reason for the disappointing performance is the forced choice of OpenGL ES by TensorFlow Lite due to the lack of the much more powerful OpenCL on the Jetson Nano.

Note that you also needed the previously built libtensorflow-lite.a and libflatbuffers.a libraries when deploying a GPU delegate.
The installation requires much resourses. In order to compile the GPU delegate C++ API, bazel has to be installed first.

Memory swap size.

Building the full TensorFlow Lite 2.3.1 package requires more than 6 Gbyte of RAM. It's best to temporarily reinstall dphys-swapfile to get the extra space of your SD card. Once the installation is complete, we will delete dphys-swapfile. Follow the following commands.
# install dphys-swapfile
$ sudo apt-get install dphys-swapfile
# give the required memory size
$ sudo nano /etc/dphys-swapfile
# reboot afterwards
$ sudo reboot.
2 GB swap space

If all went well, you should have something like this.

4 GB swap memory

For the record, the figure shown is total amount of swap space allocated by dphys-swapfile and zram. Don't forget to remove dphys-swapfile when your done.

Bazel.

Bazel is a free software tool from Google used for automatically building and testing software packages. You could compare it to CMake, used by OpenCV, but the latter only builds software and has no test facility. Bazel is written in Java, a platform-independent language, largely based on C ++ in terms of syntax. To compile Bazel, we must first install Java and some other dependencies with the following commands.
# get a fresh start
$ sudo apt-get update
$ sudo apt-get upgrade
# install pip and pip3
$ sudo apt-get install python-pip python3-pip
# install some tools
$ sudo apt-get install build-essential zip unzip curl
# install Java
$ sudo apt-get install openjdk-11-jdk
Next, we can download and unzip the Bazel software. We need Bazel release 3.1.0 for TensorFlow Lite 2.3.1, so be sure you install the right version.
$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
$ unzip -d bazel bazel-3.1.0-dist.zip
$ cd bazel
During installation, Bazel uses a predefined ratio of the available working memory. This ratio is too small due to the limited size of the RAM of the Jetson Nano. To prevent crashes, we must define the size of this memory to a maximum 1600 Mbyte during the procedure. This is done by adding some extra information to the script file compile.sh. You can add the text -J-Xmx1600M to the line that begins with run..(around line 154). See the screen below. Use the well-known <Ctrl + X>, <Y>, <Enter> to save the change.
$ nano scripts/bootstrap/compile.sh -c
Bazel 1600 Mb heap

Once the Java environment for Bazil has been maximized to 1600 Mb, you can start building the Bazel software with the next commands. When finished, copy the binary file to the /usr/local/bin location so that bash can find the executable anywhere. The final action is to delete the zip file. The total build takes about 33 minutes.
# start the build
$ env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
# copy the binary
$ sudo cp output/bazel /usr/local/bin/bazel
# clean up
$ cd ~
$ rm bazel-3.1.0-dist.zip
# if you have a copied bazel to /usr/local/bin you may also
# delete the whole bazel directory, freeing another 500 MByte
$ sudo rm -rf bazel

Build TensorFlow Lite GPU delegate.

With Bazel up and running we can start building the GPU delegate for TensorFlow Lite 2.3.1 on our Jetson Nano. Download TensorFlow from GitHub and unpack the software.
# download TensorFlow 2.4.1
$ wget -O tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v2.4.1.zip
# unpack and give the folder a convenient name
$ unzip tensorflow.zip
$ mv tensorflow-2.4.1 tensorflow
# download TensorFlow 2.4.0
$ wget -O tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v2.4.0.zip
# unpack and give the folder a convenient name
$ unzip tensorflow.zip
$ mv tensorflow-2.4.0 tensorflow
The next step before compiling the GPU delegate is to configure Bazel. This is done by a script file and the command-line options. Let's start with the script file. With the following command, Bazel asks you a few questions. Define Python 3 as the default Python version.
$ cd tensorflow
$ ./configure
jetson@nano:~/tensorflow$ ./configure
You have bazel 3.1.0- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python3]: <enter>

Found possible Python library paths:
 /usr/local/lib/python3.6/dist-packages
 /usr/lib/python3.6/dist-packages
 /usr/lib/python3/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.6/dist-packages] <enter>
/usr/lib/python3/dist-packages

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Found CUDA 10.2 in:
   /usr/local/cuda-10.2/targets/aarch64-linux/lib
   /usr/local/cuda-10.2/targets/aarch64-linux/include
Found cuDNN 8 in:
   /usr/lib/aarch64-linux-gnu
   /usr/include
Found TensorRT 7 in:
   /usr/lib/aarch64-linux-gnu
   /usr/include/aarch64-linux-gnu

Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 5.3

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: <enter>

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: <enter>

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl          # Build with MKL support.
--config=monolithic   # Config for mostly static monolithic build.
--config=ngraph       # Build with Intel nGraph support.
--config=numa         # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2           # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws        # Disable AWS S3 filesystem support.
--config=nogcp        # Disable GCP support.
--config=nohdfs       # Disable HDFS support.
--config=nonccl       # Disable NVIDIA NCCL support.
Configuration finished

With the script file now all set and done, you can start the build with the command below. For clarity, it's one long line.
$ sudo bazel build -s -c opt --copt="-DMESA_EGL_NO_X11_HEADERS" tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so
After 7 minutes you will get with the following screen.

GPU delegate compiled

You will find the library at the mentioned location.

GPU delegate folder


If you had to reinstall dphys-swapfile, it's time to uninstall it again. This way you will extend the life of your SD card.
# remove the dphys-swapfile (if installed)
$ sudo /etc/init.d/dphys-swapfile stop
$ sudo apt-get remove --purge dphys-swapfile

Benchmark.

With the GPU delegates library at a place, it's time to do some testing. Four well known TensorFlow Lite models have been deployed with and without GPU delegates at two different clock speeds. One overclocked, the other at default speed. Additional, some numbers from an overclocked Raspberry Pi 4 has been added to the table as well. The results speak for themself. All code is at our GitHub pages. Just click on the name of the model, and the corresponding C++ example shows up.
ModelCPU
2 GHz
GPU delegate
2 GHz
CPU
1.47 GHz
GPU delegate
1.47 GHz
Raspberry Pi 4
1.9 GHz
15.2 FPS
11.8 FPS
12 FPS
11 FPS
9.4 FPS
28.5 FPS
- FPS
21.8 FPS
- FPS
24 FPS
MobileNetV1 50 FPS- FPS36.3 FPS - FPS38.5 FPS
11 FPS
9.1 FPS
9 FPS
8.3 FPS
7.2 FPS
images/GithubSmall.png
images/YouTubeSmall.png
images/SDcardSmall.png
Back to content