Deep learning examples on Raspberry 32/64 OS
Last updated: January 4, 2023
On this page, we focus on the software. Ultimately, you should be able to perform the examples we present here yourself. They all written in the C language, one of the fastest computer languages. All use only a Raspberry Pi, no additional hardware is required. We start with an model overview. Later, detailed hands-on gives you all information on how to run these examples yourself.
Regularly, we get the question if we have an SD image of a Raspberry Pi 4 with pre-installed frameworks and deep-learning examples.
We are happy to comply with this request. Please, find a complete working Raspberry Pi 4 dedicated to deep learning on our GitHub page. Download the zip file from our GDrive site, unzip and flash the image on a 16 GB SD-card, and enjoy!
The overview speaks for itself. The highest frame rate measured comes from a Raspberry 64-bit OS overclocked to 1950 MHz. The lowest is the standard 32-bit Raspbian at 1500 MHz. Frame rates are only based on model run time (interpreter->Invoke()). Grabbing and preprocessing of an image are not taken into account, nor plotting output boxes or texts. This way, models working with pictures instead of video streams can also be measured. The actual frame rate will be slightly lower when playing video's or using a camera.
This application detects multiple objects in a scene. The most commonly used models are the SSD (Single Shot Detection) and YOLO (You Only Looks Once). We have some examples on GitHub of the YOLO version, but here the TensorFlow Lite SSD is explored as being one of the fastest. The COCO SSD MobileNet v1 recognizes 80 different objects. It can detect up to ten objects in a single scene. Note also, the 64-bit version is suitable for both the Raspberry 64-OS as for Ubuntu 18.04 or 20.04.
This application tries to detect the outline of multiple objects in a scene. It is done by with so-called semantic segmentation: a neural network attempts to associate every pixel in a picture with a particular subject. Tensorflow Lite has one segmentation model capable of classifying 20 different objects. Keep in mind that only reasonable sized objects can be recognized, not a scene of a highway with lots of tiny cars. Just like the previous example, the 64-bit version is suitable for both the Raspberry 64-OS as for Ubuntu 18.04 or 20.04.
This application estimates a person's pose in a scene. The deep learning model used recognizes elbows, knees, ankles, etc. TensorFlow Lite supports two models, a single and a multi-person version. We have only used the single-person model because it gives good results when the person is centred and in full view in a square-like image. The multi-pose variant lacks the robustness in our tests. Only during optimal conditions, this model came with an acceptable outcome. As usual, the 64-bit version is suitable for both the Raspberry 64-OS as for Ubuntu 18.04 or 20.04.
The most well known is, of course, the classifications of objects. Google hosts a wide range of TensorFlow Lite models, the so-called quantized models in their zoo. The models are capable of detecting 1000 different objects. Square images have been used to train the models. Therefore, the best results are with a square-like input image. Our C++ example supports all TensorFlow Lite models from the zoo, as you shall see. Now also, the 64-bit version is suitable for both the Raspberry 64-OS as for Ubuntu 18.04 or 20.04.
Face mask detection.
This application detects if a person is wearing a face mask or not. Two phases are involved in this. First, all the faces are recognized and marked in the picture. Next, every face will be examined by a second deep learning network, which will predict if the person is wearing a mask.
The face recognition software is from Linzaer (linzai). The detection of the face mask is from Baidu. Both are in the public domain. You have to install two frameworks on the Raspberry Pi: ncnn and Paddle Lite. Unfortunately, Paddle Lite can not be installed on a Raspberry Pi with a 32-bit operating system, only on a 64-bit OS, or on a Raspberry Pi 3. Ubuntu 18.04 or 20.04 on a Raspberry Pi 4 is also a possibility. The frame rate depends on the number of detected faces and can be calculated as follows: FPS = 1.0/(0.04 + 0.011 x #Faces), when overclocked to 1950 MHz.
Face mask detection 2.0.
Version 2.0 of the face mask detection application. Now, it's not the usual cascade of the two deep learning models, one face recognition and a second one that detects the masks, such as the cast in the previous Paddle Lite version.
Using a TensorFlow Lite re-trained MobileNet V2 SSD model, it recognizes three classes: no maks, a mask, and wearing a mask incorrectly. The latter category is not very convincing, given the small size of training samples.
The model recognizes not only the white masks but also the black, coloured and fancy masks. An example is a girl with the monkey mask.
Although it can detect more faces/masks in the same scene, the best result is still one face in front of the camera.
This application detects faces in a video stream. It is fast, over the 80 FPS on a bare Raspberry Pi 4 with a 64-bit OS.
We have examples of three frameworks. Installation of the MNN or ncnn is necessary before running the app. The OpenCV app, on the other hand, has no other dependencies. As you can see in the graph below, the MNN 8 bit quantized models are very fast.
This application recognizes faces from a video stream. Faces and their landmarks will be detected by an MtCNN network, or with a RetinaFace network. The later one outperforms MtCNN but takes a few mSecs extra time. Once found, it will look for the matching face in a database.
As the recognition network, we use ArcFace. It is by far the best network that can recognize a face from a large data set.
The last step is an optional anti-spoofing check.
When not founded, a face is added in the dataset automatically (optional). We have limited the database to 2000 items for now. If you reach higher numbers, let us know.
Face recognition with mask detection.
This application is identical to the above face recognition app. The only difference is the additional face mask recognition. As is known, recognition becomes very uncertain when a mask covers a large part of the face. An additional warning can help now.
A well-known family of deep learning models detecting different objects in a scene are the YOLO networks. YOLO stands for You Only Look Once. Joseph Redmon developed the original YOLO model. Releasing new versions, the model evolves. After YOLOV3, Joseph Redmon stopped his work on the project. He has ethical and moral objections to certain potential negative side-effects of his work. Think of the military embracing deep learning techniques. However, Alexey Bochnkovskiy continues working on new ideas for YOLO. They release Yolo version 4 in the summer of 2020. A few months later, another group of researchers around Glenn Jocher, how worked on porting YOLOV3 for PyTorch, released his version of YOLO, calling it version 5. Most notable is also YoloX, an excellent model that is perfect for edge devices like the Raspberry Pi. All versions are on our GitHub page.
It seems like a simple task, tracking an object as it moves in a video scene. However, it turns out to be very complicated, especially when objects intersect. Or when they are (temporarily) hidden behind other objects (occlusion). Several algorithms are available, such as SORT, deepSORT, fairMOT and transMOT. Here we use the ByteTrack method because it is a lightweight algorithm, well suited for edge devices and has good results. It uses, among others, a Kalman filter to predict the future location, given the movement history.
An object tracker is only as good as the underlying object recognition model. In the case of YoloX, we have achieved impressive results. TensorFlow Lite SSD, with its less robust outcome, has a higher FPS, but the object tracking is of disappointing quality.
While TensorFlow Lite segmentation was still a simple model, YoloV5-seg or Yolact is of a completely different order. It is complex and very precise. Even objects of the same class are distinguished from each other, as can be seen from the different zebras in the picture. The input dimension is large (550x550 pixels). The model distinguishes 80 different classes. All this comes with a price; it is not fast on a bare Raspberry Pi 4.
Head pose estimation.
This deep learning model tries to estimate the position of the head. First, all faces in the image are detected. Then the landmarks, such as eyes, nose and mouth corners, are determined. The 2D landmarks now found are projected onto a 3D model of the face. Together with the estimated 3D position of the camera, a so-called PnP algorithm calculates the relative head position. These kinds of algebraic solvers need a minimum of six points. In our simple routine, the sixth point is an interpolation of the location of the chin. It will, of course, affect the accuracy. More accurate models work with up to 66 landmarks. However, they consume more of the Raspberry Pi's limited resources and will be much slower than the fast 20 FPS we're reaching now on a bare RPi.
A great deep learning model is the detection of text in an image. It runs on the OpenCV DDN module, so no other framework needs to be installed. PaddlePaddle has used it as the basis for their PaddleOCR application.
One final note, it's not intended for a Raspberry Pi. That's why it will be slow. It takes a few seconds to complete one image. On the other hand, we thought it was too good not to mention.
Super-resolution tries to enlarge images without losing detail. A blurred area in an image, such as clouds, is smoothed out, while high-detail and high-contrast areas are best enlarged with sharp edges. Deep learning is widely used to solve the often contrasting demands of super-resolution.
Here the award-winning RealSR solution from the Tencent YouTu Lab is presented. In a word, the results of this deep learning model are stunning. Details that have never been noticed in the small image are suddenly clearly visible.
The only drawback is the use of the Vulkan drivers due to the use of the ncnn framework. Unfortunately, it is only possible to run the software on a Jetson Nano. For the Raspberry Pi, we will have to wait until the preliminary Vulkan driver is further developed.
One of the most fascinating applications of deep learning is generating a colour image from a black-and-white picture, the so-called colourization. It is done by a GAN network: a Generative Adversarial Network.
The characteristic of a GAN is that the output has the same size as the input. Often the typology follows an hourglass. From large to narrow and then fanned out again. The algorithm works with the Lab colour space.
In this space, the L component represents perceptual lightness. It is (almost) equal to the black/white input. That leaves the colour components a and b to be estimated. It is done by the network based on the scene. It takes a lot of calculations to edit an image. On a Raspberry Pi 4, it takes about 10 Sec.
Another amazing application of the GAN network is facial reconstruction. With a blurred image of a face as input, the network is able to reconstruct an impressively accurate natural output. At the basis works a super-resolution network with special attention to the human facial features, such as the eyes, mouth, nose and ears. As always with GANs, it takes a lot of computation. On the Raspberry Pi, it takes 44 seconds for a single face or over 4 minutes for an entire scene of faces.
YoloCam is a software package transforming your Raspberry Pi to a stand-alone, AI-powered camera. It runs on a Raspberry Pi 4, 3B+ or even on a Raspberry Pi Zero 2W, making it the cheapest camera with fully functional deep-learning capacities.
You can define what actions YoloCam performs when it recognizes an object. For instance, send you an email. Or make a movie and store it at Gdrive. Or activate one of its GPIO pins. At the same time, you can view the live feed in any browser.
Installation is simple. Just download the software and flash it to an SD card. Once inserted into your Raspberry Pi, everything works right away.
The software comes with the latest Raspberry Pi Bullseye operating system. You don't need to be able to program. However, the used C++ source code is available on the image.
Install the software.
We start with the installation of all necessary software. It's a bit more complicated than simply downloading a file, as you can see in the checklist below.
- Operating system.
- Debian version.
- Installing OpenCV.
- Installing Code::Blocks.
- Installing framework.
- Downloading software.
- Compiling software.
- Running the example.
Read this section carefully before downloading any software.
32 or 64 bit operation system.
The first decision to make is the operating system you want to use. Most models work on either 32 or 64 bit operating systems. Obvious, the models will run faster on a 64-bit OS. If you already have a 32-bit Raspbian operating system and want a taste of deep learning, keep using it and download a relatively small ncnn framework. If you like to run more complex models, you can migrate to the 64-bit operating system. Keep in mind that transfer involves a brand new installation of your Raspberry Pi. You will lose all software on your SD card. Better to use a new card for your adventure. There are examples, such as face mask detection, that only work on a 64-bit RPi. The guide is found here: Install 64 bit OS on Raspberry Pi 4.
Buster or Bullseye.
The next decision is the choice of your Debian version. Until recently, only Debian 10, Buster, was available.
Early November, the Raspberry team released Bullseye, the Debian 11 operating system. Both are suitable to run your deep learning app.
Two comments. Bullseye no longer supports the Raspicam. libcamera has taken its place. While Raspicam is easily accessible for OpenCV and Python cv::VideoCapture cap(0), libcamera works only with a streamer like GStreamer in OpenCV. We'll cover this in the next section.
Also, keep in mind that it is still early days; no doubt, there will be many updates in the coming months.
By the way, there is a legacy release of Raspicam for Bullseye if you want to use the 'old' stack.
The following decision is overclocking. Increasing the clock frequency on your Raspberry Pi board makes applications run faster. It has a price, heat. Without a heat sink, the Raspberry protects itself in no time by automatically lowering the clock frequency. If you don't have a heatsink, don't overclock. With the proper cooling, you might consider overclocking. We always run his RPi at about 1900 MHz. Sometimes 1950 MHz. Higher frequencies are sure to crash your Raspberry Pi when running demanding tasks like deep learning apps. Despite some enthusiastic authors who claim they reach 2200 MHz. They may manage to boot the Raspberry on that frequency and then immediately write their article. Safe overclocking is covered here.
You must have OpenCV installed on your Raspberry Pi to work with video and other image-related tasks. Some may have OpenCV already on their RPi. Please check if your version is C++ compatable with the following commands. All three must show some location on your disk.
If your package is installed with $ sudo apt-get install or $ pip3 it will not work as these are Python versions. And as you know, speed and Python don't go hand in hand. In that case, install OpenCV all over again.
We have written many guides on how to install OpenCV on your RPi. It doesn't matter which version you use, as long as it is version 4, for example, this manual about version 4.5.0. Pay attention to which operating system you are using, 32 or 64 bit. Each requires a different installation.
One last tip; if you're planning to use a camera, install GStreamer before OpenCV. This way, GStreamer is fully integrated into OpenCV.
You need a good IDE to write your C++ program. You could use Geany as it comes with the Raspbian operating system. However, Geany cannot handle projects, only individual files. In the end, you mess with Make to integrate all the different files into one executable. Second, Geany has limited debugging tools.
We are going to use Code::Blocks. The IDE can handle multi-file projects and has excellent debugging features like the inspection of variables, threads or CPU registers. The IDE is relatively simple and intuitive to understand. The following command in your terminal will allow you to install Code::Blocks.
If you have enough memory, you can install the optional 'codeblocks-contrib' plugins containing valgrind, a library finder, an additional spellchecker, and more.
$ sudo apt-get install codeblocks
# you can install some optional plugins.
$ sudo apt-get install codeblocks-contrib
Before starting straight away with a deep learning learning network, it might be an idea to check your installation and your skills so far by doing one of the examples on this page. It also gives you more information on how to handle errors. You'll find it comforting to have James Bond riding his motorcycle on your screen. At least then you know you're halfway through with the software.
We are now going to install the Face Mask Detection software, as one of our favoured examples on GitHub. In the dependencies list, found in the Readme.md, you read that the application uses two frameworks: ncnn and Paddle-Lite. The ncnn framework detects faces in an image. Paddle-Lite determines whether or not the person is wearing a mask. Also important, due to Paddle-Lite, this application can only run on a Raspberry Pi with a 64-bit operating system.
So let's start. If you haven't a 64-bit operating system, you better start all over again by installing the proper OS and OpenCV version. Next install, as suggested, the ncnn framework according to the page mentioned in the Readme.md. Of course, if you have ncnn already on your RPi, no need to install it again. Last, install Paddle-Lite if you haven't done it already. Be sure to use our installation guides because all examples expect to find the directory structure as used in these guides.
At this point, you have on the Raspberry Pi the next software running. It will take about 22.2 GByte of disk space.
- 64-bit operating system
- OpenCV 64-bit
Downloading the deep learning example.
Once all software is available, you can download the software from the GitHub page. In the Readme.md you will find the instructions. For the sake of clarity, we will repeat them here.
# create a folder called software (if not already done)
$ mkdir ~/software
$ cd ~/software
# create a folder, for instance, face to store the app
$ mkdir face
$ cd face
# download the package
$ wget https://github.com/Qengineering/Face-Mask-Detection-Raspberry-Pi-64-bits/archive/master.zip
$ unzip -j master.zip
# you may want removing files not used any more
$ rm master.zip
$ rm LICENSE.txt
$ rm README.md
If you had used the download button on GitHub, you end up with the zip file in your Downloads folder. With some trial and error, you certainly will get the file extracted to a location of your choice.
Compiling the software.
The next step is compiling the software so it can run on your Raspberry Pi. Open Code::Blocks and load the project file (* .cbp), in this case MaskUltra.cbp. It not only loads the CPP files but also set all environment options correctly. Select the Release option and compile the example.
Running the software.
Hopefully, the compilation is successful and you can run the Face Mask Detection application.
Most deep learning C++ examples work with a single picture or video stream. We often provide an mp4 movie illustrating the functionality of the app.
If you want to use a camera in your application, you need to change some code to get it to work. Usually, people use the cv::VideoCapture routine to get a live video instead of the movie. Below you see a code snippet.
As expected, simplicity comes with a price. It only works well if your deep learning processing time, called latency, is very low. Otherwise, the captured images will be queued, causing significant delays, resulting in the video being out of sync. And, more importantly, the cv::VideoCapture call has been removed in the new Bullseye operating system.
Better to use streaming software like GStreamer. It is supported by Buster and Bullseye and works without buffering. Your live images are always near real-time, with no significant lag, even if you're only processing two frames per second.
Another way is to use a libcamera C++ API wrapper if you have Bullseye OS. This method required even less CPU usage compared to the GStreamer solution, leaving more for the actual deep learning app.
Getting the camera working with GStreamer involves more code. The best way to get started is to build one of the four GStreamer apps on our GitHub page. Which one you need depends on the operating system you are using. There are 32 and 64-bit versions for both Debian 10 and 11.
YoloX + GStreamer camera.
Let's make an example. You have YoloX, and you want to connect a camera. We use the original Raspicam one many people will use.
We assumed you have the YoloX up and running in Code::Blocks on a 64-bit Bullseye machine. Also, you have the camera working with the GStreamer 64-bit Bullseye example on your Raspberry Pi.
It comes down to merging two Code::Blocks projects (YoloX and GStreamer) into one. The best place to start is with the GStreamer app. Keep the YoloX project next to it. There are now four steps involved.
If you look closely at the GStreamer app, you'll see an install code that initializes the pipe and main loop. The loop starts with a while statement and continuously displays captured frames to the screen with imshow(). Here, the object detection algorithms of the YoloX code must be inserted.
The YoloX app has a lot more code. Find the main() procedure, usually the last routine, in the cpp file. The first part of this routine loads an image. Then a few lines initializing the ncnn framework with the YoloX deep learning model. After loading the model, the actual detection of the objects in the image is done. Insert this part into the while loop just before the imshow() statement. As you might suspect, the camera's newly captured frame is analyzed. And the result is plotted in the scene before displaying it.
In the GStreamer code, the captured images are stored in the cv::mat frame. In the YoloX app, it's just m. When you put the two together, you have to decide which name to use, m or frame. In the example above, m is used.
Don't make the mistake of putting the yolox.load_param() in the while loop. There is no reason to reload the same data over and over. It will only slow down the throughput. On the other hand, if you also keep the std::vector<Object> objects out of the loop, the list will not be reset and grow continuously.
Before detecting objects, YoloX needs initializing. Look at the original code at YoloX.cpp. There are only three lines involved in setting up YoloX. Place them above your while loop. The loading of images (imread) is not needed, as is the check on the arguments (if (arg != 2)).
The following action is to merge all declarations a the beginning of the cpp file.
Note the absence of cv::imread(imagepath, 1); in the declaration of cv::mat m. We don't read a file anymore. GStreamer will fill the image now.
Look also at the size of the image. We purposely changed the resolution. YoloX can handle large images. 640x640 is a decent size compared to other deep learning models. For example, EfficientNet-B0 only works with 224x224. The ncnn framework will internally shrink the 1280x720 image to the required 640x640 before feeding the deep learning network. YoloX can detect the objects, even if the scene is cut in half. Finally, the output is scaled back to its original sizes.
Do not use high frame rates. If you expect to process 2 frames per second then there is no reason for a 30 FPS stream as it only consumes CPU power.
By the way, let ncnn resize the image and not GStreamer. GStreamer is much slower as it creates high-quality images not needed for deep learning.
Now that the main routine is ready, you can add all the remaining YoloX (sub)routines to the cpp file. Make sure they stay in the same order. Without an associated header (*.hpp), the order of appearance is imperative for the compiler.
Once done, you have to to merge all header files. It's nothing more than copying the headers from the YoloX.cpp to the new app. Avoid duplication.
With the code ready, the next step is the environment. You have to tell the compiler where it can find all the headers and libraries
In fact, the whole action is merging the two Code::Blocks project files into one. The code below speaks for itself.
One could walk through all the menu options to set the correct settings. It is better to modify the Code::Block project files (*.cbp) offline with a text editor. As seen in the example above, now you need to copy and paste the <compiler> and <linker> section. Again, avoid double entry.
Time to compile and see what happens. Usually, you can expect two types of errors.
The first type is the easiest to fix: missing files. Files are in different folders than indicated. Or needed libraries are not found due to missing directions. Find more information on how to solve this type of problem in the next section on a short tour of code blocks.
The second type of error is code related. It requires some experience with C++ coding. Usually, Google and StackOverflow are the sources of answers. Below is an issue while merging GStreamer with YoloX: cannot to use 'typeid'. It is a typical ncnn related error as RTTI was disabled during ncnn installation. It results in setting the -fno-rtti flag, which, in turn, generates the error. Removing the flag from the compiler options list resolves the issue.
We've posted this entire project on GitHub, so you've got a good starting point. It is exemplary for connecting a camera to a deep learning model.
YoloX + libcamera API wrapper.
If you have Bullseye installed, you could use the C++ libcamera API wrapper instead of GStreamer. The wrapper is a little faster than GStreamer. Also, it requires somewhat less CPU power. The merging procedure doesn't differ from the one with GStreamer, no need to repeat it.
As with the GStreamer, we've now also put a repo on GitHub with the libcamera wrapper so you can experiment. By the way, make sure you have libcamera working before you start.
Working with Code::Blocks always involves the same steps. All our GitHub projects have project files (* .cbp). Once loaded in Code::Blocks, all environment options are automatically set correctly.
Follow this procedure if you want to start with a blank project. First, you load your source code into the IDE. Second, you have to give the folder where the necessary headers are. Do this with the menu option Project → Build options.
Select tab sheet Seach directories and under Compiler give the locations where the used headers are. Note, do not select the Debug or Release option, but use instead of the project name as can be seen below.
The next step is to specify the libraries used and the linker flags. Again, use menu option Project → Build options, but now select tab sheet Linker settings. Here the linker settings are shown as used in the Face detection app.
Just like the headers, you must give the location of the used libraries now. Do this in tab sheet Search directories, tab Linker.
You are ready with most settings. However, some applications may require some additional command line parameters during startup. Give these in menu option Project → Set programs' arguments. Select your target (Debug or Release) and given the arguments just like you would on the command line. Below some example.
Instead of using the GUI, you can modify all settings in the Code::Blocks project file (*.cbp). The file is in XML format and is very readable. Before you make any modification, close the project in Code::Blocks first. This way any change you made will be loaded in Code::Blocks when you reopen your project.