Paddle (Lite) - Q-engineering
Go to content
Install Paddle on Jetson Nano

Install Paddle (Lite) deep learning framework on a Jetson Nano.


This page will guide you through the setup of Baidu's Paddle-Lite framework on a Jetson Nano.
PaddlePaddle is the Chinese counterpart of TensorFlow. It's widely used in industry, institutions and universities. It supports several hardware environments including NPU acceleration with the Rockchip RK3399, found on many single-board computers. Software acceleration can be done with CUDA and cuDNN libraries. For more information about the Paddle library, see the GitHub pages or the Chinese tutorial

Paddle-Lite Installation.

We are going to install Paddle-Lite version 2.7.0 because it supports cuDNN 8.0, found in Jetpack 4.5 of your Jetson Nano. It makes Paddle-Lite with MNN the only two Lite frameworks for small devices supporting CUDA and cuDNN. All other frameworks have either no GPU acceleration or some form of Vulkan support like, for instance, ncnn.

Note, this is a C ++ installation it is not suitable for Python. Paddle-Lite's Python interface relies on the PaddlePaddle framework. Since we don't want to use an additional 3 GByte of disk space just for the Python interface, we will not use it. And, as you know, speed and Python don't go hand in hand.

The Paddle-Lite framework has almost no dependencies. The libraries needed are downloaded and compiled automatically during installation. Re-install OpenCV first if you like it to have CUDA support also. The installation guide is here and takes about an hour and a half. It is not mandatory. The entire installation of latest version of Paddle-Lite on a Jetson Nano is as follows.
# check for updates
$ sudo apt-get update
$ sudo apt-get upgrade
# install dependencies
$ sudo apt-get install cmake wget
$ sudo apt-get install patchelf
# download Paddle Lite
$ git clone
$ cd Paddle-Lite
# build Paddle Lite (± 2 Hours)
$ ./lite/tools/ \
  --build_cv=ON \
  --build_extra=ON \
  --arm_os=armlinux \
  --arm_abi=armv8 \
  --arm_lang=gcc \
  cuda \
# copy the headers and library to /usr/local/
$ sudo mkdir -p /usr/local/include/paddle-lite
$ sudo cp -r build.lite.armlinux.armv8.gcc/inference_lite_lib.armlinux.armv8/cxx/include/*.* /usr/local/include/paddle-lite

$ sudo mkdir -p /usr/local/lib/paddle-lite
$ sudo cp -r build.lite.armlinux.armv8.gcc/inference_lite_lib.armlinux.armv8/cxx/lib/*.* /usr/local/lib/paddle-lite
The compilation is ready after two hours. It takes up about 7.2 GByte on your disk.



Please note also the folder with the examples.


After copying the headers and library to the /usr/local/ folder, you may want to delete the entire Paddle-Lite folder. This action will free up about 7 GByte on your SD card. Keep in mind that you also delete the samples. Although those can be found on GitHub if needed. It's all up to you.
$ sudo rm -rf Paddle-Lite

OpenCV dependency.

First of all, you need to have OpenCV installed on your Jetson Nano. Paddle needs OpenCV for image rendering. It will install opencv-python <= from its wheel. Besides the known discouragement of an OpenCV pip installation, this version is not available in any of the pypi and piwheels databases, thereby falling back to version, something you don't want, due to its avalanche of outdated dependencies.

Just keep things simple. Check which OpenCV version Python 3 imports. If needed, install a newer version according to our guide.


We even have removed the OpenCV requirement from our wheel, just to prevent installing double versions on your Nano. Remember, space on the SD card is also precious.

Paddle pip3 installation.

The pip3 is by far the easiest and most reliable installation of Paddle on your Jetson Nano. If you plan to use Paddle with Python3 of course. The official Paddle installation does not support AARCH64 wheels. That's why we've put a wheel on our GitHub page that makes installation easy. Follow the instruction below. By the way, it may take a while to build the scipy library if needed.
# a fresh start
$ sudo apt-get update
$ sudo apt-get upgrade
# install dependencies
$ sudo apt-get install cmake wget
$ sudo apt-get install libatlas-base-dev libopenblas-dev libblas-dev
$ sudo apt-get install liblapack-dev patchelf gfortran
$ sudo -H pip3 install Cython
$ sudo -H pip3 install -U setuptools
$ pip3 install six requests wheel pyyaml
# upgrade version 3.0.0 -> 3.13.0
$ pip3 install -U protobuf
# download the wheel
$ wget
# install Paddle
$ sudo -H pip3 install paddlepaddle_gpu-2.0.0-cp36-cp36m-linux_aarch64.whl
# clean up
$ rm paddlepaddle_gpu-2.0.0-cp36-cp36m-linux_aarch64.whl
If everything went well, you will get the following screen dump.


It may happen that the installation is closed with a warning about the PATH. If so, add the location (~ / .local / bin) at the end of the .bashrc file with $ sudo nano .bashrc as shown below.


You can check the Paddle installation on your Raspberry Pi as follows.


Paddle installation from scratch.

You can build the Paddle deep learning framework from scratch on your Jetson Nano, if you don't want to use the python wheel or if you need the C++ API inference library. The whole procedure takes about 5 hours and will use approximately 20 GByte of your disk. Obviously, a 64GB SD card will be required as a minimum. As usual, the first step is to install the dependencies. The entire list is shown below.
# a fresh start
$ sudo apt-get update
$ sudo apt-get upgrade
# install dependencies
$ sudo apt-get install cmake wget curl unrar swig
$ sudo apt-get install libjpeg8-dev libpng-dev libfreetype6-dev
$ sudo apt-get install libatlas-base-dev libopenblas-dev libblas-dev
$ sudo apt-get install liblapack-dev patchelf gfortran
$ sudo -H pip3 install Cython
$ sudo -H pip3 install -U setuptools
$ sudo -H pip3 install six requests wheel pyyaml
$ sudo -H pip3 install numpy
# upgrade version 3.0.0 -> 3.13.0
$ sudo -H pip3 install -U protobuf
$ sudo -H pip3 install matplotlib
$ sudo -H pip3 install nltk
# scipy may take a while to compile
$ sudo -H pip3 install scipy
$ sudo -H pip3 install graphviz funcsigs
$ sudo -H pip3 install decorator precd ttytable
$ sudo -H pip3 install pillow

The installation of Paddle on the Jetson Nano requires a lot of memory. To meet this demand, some extra swap space is created with the dphys-swapfile tool. Once the installation is complete, the tool will be removed. Follow the following commands to install dphys-swapfile.
# install dphys-swapfile
$ sudo apt-get install dphys-swapfile
# give the required memory size
$ sudo nano /etc/dphys-swapfile
# reboot afterwards
$ sudo reboot.
2 GB swap space

If all went well, you should have something like this.

4 GB swap memory

For the record, the figure shown is total amount of swap space allocated by dphys-swapfile and zram. Don't forget to remove dphys-swapfile when your done.
The next step is the installation of Nvidia's nccl package on the Jetson Nano. PaddlePaddle needs this inter-GPU communication library when using the CUDA backend. The installation can only be done from scratch as you can see in the instructions below.
$ cd ~
# download nccl
$ git clone
$ cd nccl
# compile (± 30 min)
$ make -j4
# install system wide
$ sudo make install
nccl build rdy

With all the dependencies in place, you can now download the Paddle package. On our GitHub page, we posted a package special adapted for the Jetson Nano (aarch64). You can explore the differences with the original software with the compare button.

GitHub Compare

As can be seen in the overview, there are three modifications made. First, the already mentioned, inclusion of OpenCV in the requirements is been removed. Second, to let PaddleHub work, the version number should not be 0.0.0 as now happens. That's why version 2.0.0 is hardcoded given for now. In the future, when this version is labelled stable, it shall not be necessary any more because the version routine will generate a workable number. Last, OpenBLAS needs extra directives not given yet by the cmake file (cmake/external/openblas.cmake). We have inserted then.
Feel free to fork the latest Paddle version from their GitHub page and modify the above files in your own fork to get the very latest software.
# get Paddle
$ git clone
$ cd Paddle
# get a working directory
$ mkdir build
$ cd build
# run cmake
$ cmake -D PY_VERSION=3.6 \
        -D WITH_MKL=OFF \
        -D WITH_MKLDNN=OFF \
        -D WITH_NV_JETSON=ON \
        -D WITH_XBYAK=OFF \
        -D WITH_GPU=ON \
        -D ON_INFER=ON \
        -D CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
        -D CMAKE_BUILD_TYPE=Release ..
As you can see in the parameter list above, both are generated, a Python wheel and an inference library.

Paddle cmake rdy jetson

Before we can start the build, the number of files, open at the same time, need to be increased from the default 1024 to 2048. It is done with the ulimit command. Once set make is called with just one core ($ make -j1). Even with two, you will end up crashing due to the smothering memory swapping.
# increase the number of open files
$ ulimit -n 2048
# run with one core
$ make -j1
PAddle build Rdy Jetson

Find your Python installation wheel here. It's identical to the one we have on our GitHub page. Given, of course, you haven't forked a later version of Paddle.


You can now install the Python Paddle version with the wheel as described above.
If you had to install the dphys-swapfile service, it's now time to delete it. This way you will extend the life of your SD card.
# install Paddle Python
$ sudo -H pip3 install python/dist/paddlepaddle_gpu-2.0.0-cp36-cp36m-linux_aarch64.whl
# remove the dphys-swapfile
$ sudo /etc/init.d/dphys-swapfile stop
$ sudo apt-get remove --purge dphys-swapfile

You can test your installation with the following command.


Paddle C++ API library.

The installation of the Paddle C ++ API library follows the exact same steps as the installation of the Python version. If you have set the parameter -D ON_INFER=ON, the libraries are in the following location.



PaddleHub is a collection of script files that load and deploy the pre-trained Paddle models on your Jetson Nano with one click, as we'll see.
PaddleHub uses the Paddle framework so you must have this installed first. Obviously you must have a working internet connection to access the Paddle server.

The installation is identical to the one used for the Raspberry Pi 4. Please, follow the instructions given in that paragraph.

Conversion to Paddle Lite.

This section describes the conversion of regular, so-called liquid models, from the PaddlePaddle framework to Paddle Lite models used in embedded systems such as a Raspberry Pi 4, or Jetson Nano.
As with TensorFlow Lite, not all models are portable to the Lite framework. Some operations are not supported. The conversion tool will let you know if that's the case. Before we can build the conversion tool, it takes some preparations. First, we need at least 6 GB of RAM, just like building PaddlePaddle from scratch. Since the procedure is almost identical to the Raspberry Pi, we'll use some of the RPi screen dumps here.


Next, the -m64 flag should be disabled because an aarch64 system don't recognize this flag.


The -m64 flag is declared in flags.cmake at line 151. The file is located in the ~/Paddle-Lite/cmake folder. Expand line 151 with the text AND NOT(CMAKE_SYSTEM_PROCESSOR MATCHES "^(aarch64.*|AARCH64.*)"),as shown below. It effectively disables the -m64 flag that is unknown to aarch64 machines. We have made a pull request with Paddle Lite.


With these steps done, you can now begin to compile the Paddle Lite conversion tool. We assume you have already successfully built the Paddle Lite framework as described above. The build of the conversion tool is as follows.
$ cd ~/Paddle-Lite
# build 64-bit Paddle Lite optimize tool
$ ./lite/tools/ \
 --build_cv=ON \
 --build_extra=ON \
 --arm_os=armlinux \
 --arm_abi=armv8 \
 --arm_lang=gcc \
 cuda \

Once you have the optimizer, you can now download your deep learning model of your interest using PaddleHub. As an example, we will use the face mask detector.
The liquid model, as PaddlePaddle calls its in-depth learning models, comes in two variants. Either it is a combined model with the parameters (topology) and weights in a single file, or the parameters and weights are each stored in a separate file. The optimizer handles both types. As it happens, the face mask detector has two files.
So the next step will be extracting these __model__ and __param__ from the network. It's done with a few Python instructions. As you can see, we use both PaddlePaddle and PaddleHub.
$ python3
>>> import paddlehub as hub 
>>> import paddle

>>> paddle.enable_static() 
>>> pyramidbox_lite_mobile_mask = hub.Module(name="pyramidbox_lite_mobile_mask") 
>>> pyramidbox_lite_mobile_mask.save_inference_model(dirname="test_program")
Location test_program

The last action is converting the model and parameters to a Paddle Lite version by the just build opt application
This is done with the following command.
$ ./opt \
--model_file=/home/pi/test_program/mask_detector/__model__ \
--param_file=/home/pi/test_program/mask_detector/__params__ \
--valid_targets=arm \
--optimize_out_type=naive_buffer \

Please note the version number V2.7. Most models are not backwards nor forwards compatible.
More information about the optimize tool can be found in the wiki
If you had to reinstall dphys-swapfile, it's time to uninstall it again. This way you will extend the life of your SD card.
# remove the dphys-swapfile (if installed)
$ sudo /etc/init.d/dphys-swapfile stop
$ sudo apt-get remove --purge dphys-swapfile
Back to content