ncnn - Q-engineering
Go to content
Install ncnn software on Raspberry Pi 5

Install ncnn deep learning framework on a Raspberry Pi 5.

Last updated: May 26, 2024


This page will guide you through the installation of Tencent's ncnn framework on a Raspberry Pi 5. We only guide you through the basics, so in the end, you can build your application. For more information about the ncnn library, see: Perhaps unnecessarily, but the installation is the C ++ version. It is not suitable for Python.


The ncnn framework has almost no dependencies. It requires protobuf to load ONNX models. And OpenCV would be useful, but not necessary.


RTTI stands for Run-Time Type Identification. It is a C ++ mechanism used at runtime to get the type and memory size of an object, yet not been defined. Normally, a programmer knows the type of variable and can allocate memory that holds the object in advance. Obtaining memory from an operating system with all its processes and threads can be a relatively time-consuming operation. Modern C compilers know how much memory is required and one call to memory management is sufficient. It's one of the main advantages over Python, which is oblivious to memory requirements until it hits the line of code with a variable.

It is best not to use RTTI if you want to write the fastest possible code. It is also the case in the ncnn framework.
By default, it is compiled with a -fno-rtti flag that prevents the use of RTTI. Compiled with this flag, custom-defined layers like those found in YOLOV5 are only usable if the remaining program is still not using RTTI.

Sometimes it is not possible to avoid RTTI. Especially in mature code that requires a new function without rewriting all the code. OpenCV uses the RTTI mechanism in some places.
There is a problem here. When you compile ncnn with OpenCV, the compiler returns an error on the -fno-rtti flag. Removing the flag sometimes works with the ncnn code, depending on the type of DNN used.

At this point, the used build flag -D NCNN_DISABLE_RTTI=OFF becomes clear. It tells the compiler ncnn will allow RTTI. It means that you can now run ncnn in its full functionality without getting into problems with OpenCV. Or, for that matter, any other piece of software using RTTI.
Performance-wise, you will not notice any difference on your Raspberry Pi.


Install OpenCV first if it is not already installed. The installation guide is here and takes about an hour.
The entire installation of ncnn on a Raspberry is as follows.
# check for updates
$ sudo apt-get update
$ sudo apt-get upgrade
# install dependencies
$ sudo apt-get install cmake wget
$ sudo apt-get install libprotobuf-dev protobuf-compiler
# download ncnn
$ git clone --depth=1
# install ncnn
$ cd ncnn
$ mkdir build
$ cd build
$ cmake \
 -D CMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-linux-gnu.toolchain.cmake ..
$ make -j4
$ make install
# copy output to /usr/local folders
$ sudo cp -r install/include/ncnn /usr/local/include/ncnn
$ sudo mkdir /usr/local/lib/ncnn
$ sudo cp -r install/lib/libncnn.a /usr/local/lib/ncnn/libncnn.a
If everything went well, you will get two folders. One with all header files and one with the library as shown in the screen dumps.



Please note also the folder with the examples. Many different types of deep learning are covered here. The references to the actual deep learning models can sometimes cause errors due to version changes in the ncnn library. We recently received the following repository from nihui with the latest models:

The ncnn framework comes with some useful tools, located in the ~/ncnn/build/install/bin folder.


If you don't need the tools, or the examples, you can delete the entire ncnn folder. It saves you about 164 MB of disk space.
$ cd ~
$ sudo rm -rf ncnn

Vulkan support.

Recently, the Raspberry Pi Foundation has incorporated Vulkan drivers into their OS. ncnn can use Vulkan as an accelerator for its tensor calculations. However, the current Vulkan API isn't suited well for deep learning tasks. Neural networks love to use 16-bit floating points or 8-bit integers to speed up the calculations. Both are not well supported in the current release yet.
We have done some testing. Below you see the benchmark. The numbers behind the bar are those found without Vulkan acceleration. Overall, it is very disappointing.
pi@raspberrypi:~/ncnn/benchmark $ hostnamectl
Static hostname: raspberrypi
      Icon name: computer
     Machine ID: 072da82a1b314b32824f766429af0208
        Boot ID: 9f0761b989fb405099fa9c28c8443253
Operating System: Debian GNU/Linux 12 (bookworm)  
         Kernel: Linux 6.6.28+rpt-rpi-2712
   Architecture: arm64

pi@raspberrypi:~/ncnn/benchmark $ ./benchncnn 10 4 0 0 -1
[0 V3D 7.1.7]  queueC=0[1]  queueG=0[1]  queueT=0[1]
[0 V3D 7.1.7]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 V3D 7.1.7]  fp16-p/s/u/a=1/1/1/0  int8-p/s/u/a=1/1/1/0
[0 V3D 7.1.7]  subgroup=16  basic/vote/ballot/shuffle=1/0/0/0
[0 V3D 7.1.7]  fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/0
[1 llvmpipe (LLVM 15.0.6, 128 bits)]  queueC=0[1]  queueG=0[1]  queueT=0[1]
[1 llvmpipe (LLVM 15.0.6, 128 bits)]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[1 llvmpipe (LLVM 15.0.6, 128 bits)]  fp16-p/s/u/a=1/1/1/1  int8-p/s/u/a=1/1/1/1
[1 llvmpipe (LLVM 15.0.6, 128 bits)]  subgroup=4  basic/vote/ballot/shuffle=1/1/1/1
[1 llvmpipe (LLVM 15.0.6, 128 bits)]  fp16-8x8x16/16x8x8/16x8x16/16x16x16=0/0/0/0
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1         
                                           Vulkan | no Vulkan       
         squeezenet              123.38 |     7.38
    squeezenet_int8               9.26 |     7.21
          mobilenet                169.70 |   33.88
     mobilenet_int8               10.33 |     8.74
       mobilenet_v2             126.98 |   10.52
       mobilenet_v3             118.44 |      7.35
         shufflenet                    69.73 |      4.16
      shufflenet_v2                 92.63 |      3.39
            mnasnet                 122.38 |      7.12
    proxylessnasnet           126.68 |      7.88
    efficientnet_b0              196.14 |    12.33
  efficientnetv2_b0            270.95 |    13.69
       regnety_400m            148.22 |    10.99
          blazeface                   26.02 |       1.47
          googlenet                344.65 |     25.35
     googlenet_int8              30.04  |     24.12
           resnet18                  349.42 |     19.96
      resnet18_int8               20.91  |     16.74
            alexnet                   232.37 |     21.36
              vgg16                 1797.62 |  129.24
         vgg16_int8                120.69 |  107.01
           resnet50                  866.48 |    46.21
      resnet50_int8                54.28 |    40.21
     squeezenet_ssd          457.84 |    30.37
squeezenet_ssd_int8         32.89 |    28.15
      mobilenet_ssd            397.07 |    24.52
 mobilenet_ssd_int8           25.26 |    22.05
     mobilenet_yolo             815.46 |    58.19
 mobilenetv2_yolov3        418.37 |    37.68
        yolov4-tiny                  680.02 |    46.29
          nanodet_m               205.37 |    11.18
   yolo-fastest-1.1              107.62 |      5.62
     yolo-fastestv2                 80.40 |      4.80
 vision_transformer      21355.78 |  611.65
         FastestDet                   84.98 |      5.34
Deep learning software for Raspberry Pi
Deep learning examples for Raspberry Pi
Back to content