Deep learning software for Raspberry Pi and alternatives in 2020.
On this page, we explore various deep learning frameworks that are especially suitable for the Raspberry Pi or its alternatives. While almost any framework can eventually be installed on a 64-bit Linux system, they will not be included in the list. Only the most popular frameworks, specially adapted for ARM cores, are discussed.
All discussed frameworks are especially suitable for a Raspberry Pi with a 32-bit or 64-bit operating system. They can all perform different deep learning models. However, they cannot train models. For that, the Raspberry Pi lacks the computing power required when training deep learning networks.
A separate page covers the installation of each of the frameworks. This page is easily accessible via the clickable arrow on the right of the title.
Recently, OpenCV has an excellent deep learning module. The OpenCV DNN module runs with a wide range of models: TensorFlow, Caffe, Torch, Darknet or ONNX. It only plays already trained models, so it is not possible to train networks with new data. A nice feature is that it has on other dependencies, only the OpenCV library is used when running your model. As known, OpenCV can run on many different computers, from a Raspberry Pi or alternative to a Windows or Linux PC. All of them can now be used to play deep learning models. And on top of that, there is hardware acceleration possible, OpenCV DNN supports CUDA and OpenCL. More information and software examples can be found at our page here.
TensorFlow Lite is a stripped version of TensorFlow. It has designed for a small device, like a smartphone or Raspberry Pi. By replacing the floating points by 8-bit signed characters and pruning those inputs that have no bearing on the output, this framework is capable of running deep learning models very fast. The disadvantage of this speed is accuracy. Usually, correctness drops a few percent when running TensorFlow Lite. The conversion to a TensorFlow Lite model is another point. Obvious, only TensorFlow models can be converted. Thereby, TensorFlow Lite doesn't support every operation. More information and software examples on this page.
The ncnn framework is a fast framework built by the Chinese internet company Tencent. It has handcrafted NEON assembly code, specially designed for the ARM cores found in the Raspberry Pi, its alternatives and nowadays in almost every smartphone. It is a very lightweight, yet powerful library, which is easy to install. The ncnn framework has few dependencies. You can even use it as a standalone static library, which, when linked to your program, leaves a small footprint. Still, it is capable of running most deep learning models at impressive speeds. This framework is one of our favourites.
The ncnn framework, been written in pure C, has no Python version. It has its model format. You have to convert your existing model to the ncnn file format with the supplied tools. It supports most common other frameworks, except TensorFlow. More info on GitHub. The model zoo: https://github.com/nihui/ncnn-assets/tree/master/models.
Alibaba has recently launched MNN, a fast deep learning framework for mobile devices, with a small footprint. It is full of handcrafted ARM assembly optimization. It uses Winograd convolutions when possible. Using the NEON registers and half floating points (FP16) now available in Raspberry Pi 4 further increases the speed. The library is written in C. Alibaba MNN looks like Tencent's nccn. They are almost identical. A minor difference is that MNN can run TensorFlow and TensorFlow Lite models (it supports 149 TensorFlow operations). MNN is the fastest framework for a Raspberry Pi 4 we came across so far. MNN is at GitHub, with a model zoo. All other documents are at yuque. The latest version of MNN also supports CUDA and TensorRT. This makes it the ideal choice for the Jetson Nano family.
Internet giant Baidu is Google's Chinese counterpart. And just as Google promotes its TensorFlow, Baidu has PaddlePaddle. The lightweight version optimized for ARM cores is called Paddle Lite. This framework is currently one of the few with support for the Rockchip RK3399Pro with Neural Net accelerator that found on the Khadas VIM3 board, for example. As one of the few it also supports CUDA and cuDNN. Which makes it attractive for the NVIDIA boards. Paddle Lite works perfectly on a Raspberry Pi 4 with a 64-bit operating system.
One of the things we've noticed when working with Paddle Lite is the long compilation times. More than 3 minutes is not uncommon. There are two small disadvantages. Paddle has its unique model format, just like TensorFlow for that matter. There are some conversion tools, like Anakin, but it is unclear how well they work. A second problem may be the documentation. It is almost entirely in Chinese.
The ARM holding, producer of the ARM chips found in most smartphones and Raspberries, has developed a framework for these devices. They call it the ARMnn. Recently the holding donated the software to Linaro, where it is now open source. This framework is capable of running TensorFlow, TensorFlow Lite, Caffe or ONNX models without any converter tool. It makes interfacing easy.
However, there are some issues with this framework. It requires a lot of additional software, including full installation of TensorFlow. In the end, ARMnn occupies 3.1 Gbyte on your SD card. The software documentation is not very thorough. For instance, where to connect inputs and outputs to the tensors is not very clear. And the framework is very slow, seven times slower than TensorFlow Lite in the same conditions. For those interested, the installation procedure and software examples can are here.
MXNet + Gluon.
Apache MXNet is a full, mature framework just like TensorFlow. It is used for training and deploying all kinds for deep learning models. Apache MXNet comes with a high-level interface Gluon, something like Keras for TensorFlow. There is no lite version of MXNet, nor is the software optimized for ARM cores. Not surprisingly, it is slow and needs a lot of the limited recourses of the Raspberry Pi. The main reason we used MXNet is for converting deep learning models to ONNX format, which can be loaded into the fast and dedicated MMN framework. There is a lot of information on the net. See, for instance, https://mxnet.apache.org/.
Facebook's PyTorch is, just like TensorFlow, a mature framework. It can be applied for tensor calculations, such as Numpy, or it can be used for training and deploying all kinds of deep learning models. PyTorch is heavily based on pure Python. There is no lite version of PyTorch, nor is the software optimized for ARM cores. At the moment, there is a beta C++ API version of PyTorch. However, being under development, we don't go to install it. PyTorch is relatively slow and needs a lot of the limited recourses of the Raspberry Pi. Since it is the second popular framework after TensorFlow, it could not go unmentioned. A word about Caffe2. It is since 2018 merged into the PyTorch branch. However, still existing, it can offer a faster deployment of your deep learning models on a Raspberry Pi 4.
Of course, Caffe, grandmother of all frameworks, could not be missing from our overview. It is relatively simple to install on a Raspberry Pi. Written in C++ and with a Python shell, it is also not the slowest in our overview, although it is not in any way optimized for the ARM-NEON cores. We use the well known WeiLui89 fork with the SSD functionality. Since Caffe is widespread in universities, you find many models on the net. See, for instance, https://github.com/BVLC/caffe/wiki/Model-Zoo.
Needless to say, you can only deploy deep learning models. Training new models is not an option on a Raspberry Pi.
Another point to note is that almost any other deep learning framework, such as ncnn, MNN or opencv::ddn, import Caffe models. They often run faster than Caffe, because these frameworks are tailor-made for the ARM-NEON architecture.