
Deep learning with OpenCV on Raspberry Pi 4
Last updated: November 8, 2021
Introduction.
This page shows you how to run a deep learning model with OpenCV. The C++ examples are written for the Raspberry Pi 4, but without any modification, they compile on any other platform. We only guide you through the basics, so in the end, you are capable of building your application. For more technical information see: https://docs.opencv.org/4.2.0/d6/d0f/group__dnn.html.
Tip.

We are happy to comply with this request. Please, find a complete working Raspberry Pi 4 dedicated to deep learning on our GitHub page. Download the zip file from our GDrive site, unzip and flash the image on a 16 GB SD-card, and enjoy!
Tools.
To start with, you need of course OpenCV installed. Please follow our instruction on this page. Next, you need some user-friendly environment to build your application. We use Code::Blocks instead of Geany. The latter does not support projects with multiple files, such as Code::Blocks. Code::Blocks can be installed very simple with the following command.
$ sudo apt-get install codeblocks
Code (Caffe models).
Once installed, the C++ code below can be loaded inside the editor. Or you can download the whole project from our GitHub page. The model used here is the MobileNetV1-SSD Caffe network from chuanqi305. A lot of other models can be found on modelzoo.co. However, keep in mind that most deep learning models are very resource hungry. This might give problems if you are running them on a Raspberry Pi. You can temporarily increase the memory swap space to make more memory available, as explained here. We do not advise this technique. The swap space is intended for occasionally context changes, such as switching from LibreOffice to your browser and vice versa. Not in heavy calculations where the enormous amount of single read and write actions can wear to the SD card. And on top of that, swapping memory will delay the application.
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
#include <opencv2/core/ocl.hpp>
using namespace cv;
using namespace std;
const size_t width = 300;
const size_t height = 300;
const float scaleFector = 0.007843f;
const float meanVal = 127.5;
dnn::Net net;
const char* class_video_Names[] = { "background",
"aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair",
"cow", "diningtable", "dog", "horse",
"motorbike", "person", "pottedplant",
"sheep", "sofa", "train", "tvmonitor" };
Mat detect_from_video(Mat &src)
{
Mat blobimg = dnn::blobFromImage(src, scaleFector, Size(300, 300), meanVal);
net.setInput(blobimg, "data");
Mat detection = net.forward("detection_out");
// cout << detection.size[2]<<" "<< detection.size[3] << endl;
Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
const float confidence_threshold = 0.25;
for(int i=0; i<detectionMat.rows; i++){
float detect_confidence = detectionMat.at<float>(i, 2);
if(detect_confidence > confidence_threshold){
size_t det_index = (size_t)detectionMat.at<float>(i, 1);
float x1 = detectionMat.at<float>(i, 3)*src.cols;
float y1 = detectionMat.at<float>(i, 4)*src.rows;
float x2 = detectionMat.at<float>(i, 5)*src.cols;
float y2 = detectionMat.at<float>(i, 6)*src.rows;
Rect rec((int)x1, (int)y1, (int)(x2 - x1), (int)(y2 - y1));
rectangle(src,rec, Scalar(0, 0, 255), 2, 8, 0);
putText(src, format("%s", class_video_Names[det_index]), Point(x1, y1-5) ,FONT_HERSHEY_SIMPLEX,1.0, Scalar(0, 0, 255), 2, 8, 0);
}
}
return src;
}
int main(int argc,char ** argv)
{
float f;
float FPS[16];
int i, Fcnt=0;
Mat frame;
chrono::steady_clock::time_point Tbegin, Tend;
net = dnn::readNetFromCaffe("MobileNetSSD_deploy.prototxt", "MobileNetSSD_deploy.caffemodel");
if (net.empty()){
cout << "init the model net error";
exit(-1);
}
//cout << "Switched to " << (cv::ocl::useOpenCL() ? "OpenCL enabled" : "CPU") << endl;
//net.setPreferableTarget(DNN_TARGET_OPENCL);
cout << "Start grabbing, press ESC on Live window to terminate" << endl;
while(1){
frame=imread("004545.jpg"); //need to refresh frame before dnn class detection
Tbegin = chrono::steady_clock::now();
detect_from_video(frame);
Tend = chrono::steady_clock::now();
//calculate frame rate
f = chrono::duration_cast <chrono::milliseconds> (Tend - Tbegin).count();
if(f>0.0) FPS[((Fcnt++)&0x0F)]=1000.0/f;
for(f=0.0, i=0;i<16;i++){ f+=FPS[i]; }
putText(frame, format("FPS %0.2f", f/16),Point(10,20),FONT_HERSHEY_SIMPLEX,0.6, Scalar(0, 0, 255));
//show output
imshow("frame", frame);
char esc = waitKey(5);
if(esc == 27) break;
}
cout << "Closing the camera" << endl;
destroyAllWindows();
cout << "Bye!" << endl;
return 0;
}
A few words about this code.
For a successful compilation, you need to tell Code::Blocks where it can find the necessary OpenCV libraries and headers. Please follow the steps of our OpenCV camera example here. It gives you also enough clues on how to process live camera images with this network.
Keep your MobileNetSSD_deploy.prototxt, MobileNetSSD_deploy.caffemodel and 004545.jpg files in your working folder. This will be in the same folder as your executable. Only when running the app from the Code::Blocks IDE, you need to place both files in the working directory of Code::Blocks itself, there where the project file and obj and bin folders are located. See again the explanation at bullet 24 here, where we have the same situation with the james.mp4 movie.
If all goes well, you should get the same screen dump as we have. OpenCV is remarkable fast, a nice 3.66 FPS for a bare Raspberry Pi 4 is certainly not bad. If you overclock carefully, you can even get 4.5 FPS.
Most important is to declare your network global. It will be called many times by several routines. Making it global gives you the least overhead.
Another important point is to load your topology and weights only once because it takes a lot of time. Since they do not change during their lifetime, it makes no sense to load them every time a new image is presented to the model, a mechanism often seen in other examples.
Because there are only 20 classes to detect in the VOC2007 set, they are hardcoded. If you have more classes, load the list once from a file before you start processing the pictures, as shown in the TensorFlow examples in the next paragraph.
At line 73 you see a test to determine if OpenCL is available. If so you can accelerate the application by uncommenting the next line. A Raspberry Pi has (yet) no OpenCL library capable of running with OpenCV. If you have another machine with CUDA installed you can change the define DNN_TARGET_OPENCL for DNN_TARGET_CUDA in order to speed up calculations with the CUDA library.
Although OpenCV is fast, we try to prevent as much as possible needless copying of large memory blocks. Only one 'mat' object holding the image will do. Transfer it to the subroutine by reference (Mat &src) instead of passing it by value (Mat src) which generates a copy behind the scenes. Also never return a subroutine with a large object like a picture, since will also pass on a copy. All these techniques improve your frame rate in the end.
Everything else speaks more or less for itself as being standard C++ coding. Some additional code calculates the frame rate. You can prune it.
Compilation.
Keep your MobileNetSSD_deploy.prototxt, MobileNetSSD_deploy.caffemodel and 004545.jpg files in your working folder. This will be in the same folder as your executable. Only when running the app from the Code::Blocks IDE, you need to place both files in the working directory of Code::Blocks itself, there where the project file and obj and bin folders are located. See again the explanation at bullet 24 here, where we have the same situation with the james.mp4 movie.
If all goes well, you should get the same screen dump as we have. OpenCV is remarkable fast, a nice 3.66 FPS for a bare Raspberry Pi 4 is certainly not bad. If you overclock carefully, you can even get 4.5 FPS.

Code (TensorFlow models).
The DNN module of OpenCV also supports TensorFlow. Let's run some examples. Download the whole project with the frozen deep learning models from our GitHub page. The two models tested are the MobileNetV1-SSD and MobileNetV2-SSD. Both models are trained with the COCO dataset, which has many more classes (90) than the previous used VOC2017 set (20). As can be seen below, the code is almost identical compared to the Caffe implementation.
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
#include <opencv2/core/ocl.hpp>
using namespace cv;
using namespace std;
const size_t width = 300;
const size_t height = 300;
dnn::Net net;
std::vector<std::string> Names;
static bool getFileContent(std::string fileName)
{
// Open the File
std::ifstream in(fileName.c_str());
// Check if object is valid
if(!in.is_open()) return false;
std::string str;
// Read the next line from File untill it reaches the end.
while (std::getline(in, str))
{
// Line contains string of length > 0 then save it in vector
if(str.size()>0) Names.push_back(str);
}
// Close The File
in.close();
return true;
}
Mat detect_from_video(Mat &src)
{
Mat blobimg = dnn::blobFromImage(src, 1.0, Size(300, 300), 0.0, true);
net.setInput(blobimg);
Mat detection = net.forward("detection_out");
// cout << detection.size[2]<<" "<< detection.size[3] << endl;
Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
const float confidence_threshold = 0.25;
for(int i=0; i<detectionMat.rows; i++){
float detect_confidence = detectionMat.at<float>(i, 2);
if(detect_confidence > confidence_threshold){
size_t det_index = (size_t)detectionMat.at<float>(i, 1);
float x1 = detectionMat.at<float>(i, 3)*src.cols;
float y1 = detectionMat.at<float>(i, 4)*src.rows;
float x2 = detectionMat.at<float>(i, 5)*src.cols;
float y2 = detectionMat.at<float>(i, 6)*src.rows;
Rect rec((int)x1, (int)y1, (int)(x2 - x1), (int)(y2 - y1));
rectangle(src,rec, Scalar(0, 0, 255), 1, 8, 0);
putText(src, format("%s", Names[det_index].c_str()), Point(x1, y1-5) ,FONT_HERSHEY_SIMPLEX,0.5, Scalar(0, 0, 255), 1, 8, 0);
}
}
return src;
}
int main(int argc,char ** argv)
{
float f;
float FPS[16];
int i, Fcnt=0;
Mat frame;
chrono::steady_clock::time_point Tbegin, Tend;
for(i=0;i<16;i++) FPS[i]=0.0;
//MobileNetV1
net = dnn::readNetFromTensorflow("frozen_inference_graph_V1.pb","ssd_mobilenet_v1_coco_2017_11_17.pbtxt");
//MobileNetV2
//net = dnn::readNetFromTensorflow("frozen_inference_graph_V2.pb","ssd_mobilenet_v2_coco_2018_03_29.pbtxt");
if (net.empty()){
cout << "init the model net error";
exit(-1);
}
// Get the names
bool result = getFileContent("COCO_labels.txt");
if(!result)
{
cout << "loading labels failed";
exit(-1);
}
//cout << "Switched to " << (cv::ocl::useOpenCL() ? "OpenCL enabled" : "CPU") << endl;
//net.setPreferableTarget(DNN_TARGET_OPENCL);
cout << "Start grabbing, press ESC on Live window to terminate" << endl;
while(1){
frame=imread("Traffic.jpg"); //need to refresh frame before dnn class detection
Tbegin = chrono::steady_clock::now();
detect_from_video(frame);
Tend = chrono::steady_clock::now();
//calculate frame rate
f = chrono::duration_cast <chrono::milliseconds> (Tend - Tbegin).count();
if(f>0.0) FPS[((Fcnt++)&0x0F)]=1000.0/f;
for(f=0.0, i=0;i<16;i++){ f+=FPS[i]; }
putText(frame, format("FPS %0.2f", f/16),Point(10,20),FONT_HERSHEY_SIMPLEX,0.6, Scalar(0, 0, 255));
//show output
imshow("frame", frame);
char esc = waitKey(5);
if(esc == 27) break;
}
cout << "Closing the camera" << endl;
destroyAllWindows();
cout << "Bye!" << endl;
return 0;
}
The same remarks at the Caffe code apply here; declare your network global and load it once, together with the class labels.
The OpenCL or CUDA acceleration is an option if available. For compilation, see the remarks above.
Again, OpenCV is remarkable fast, a nice 4.94 FPS for a Raspberry Pi 4 is extremely good. Certainly, if you bear in mind that we have to classify 90 different objects. Version V2 is still somewhat slower, but on the other hand, somewhat more accurate.


pbtxt file generation.
OpenCV needs a pbtxt topology file when running a TensorFlow model. If not provide, the file has to be generated. OpenCV has some tools for this purpose. You can find them here on GitHub. Download all tf_text_graph_*.py files and store them in a folder. Move the frozen_inference_graph.pb file pd and the pipeline.config file in the same location. Now you can run the appropriate script, depending on the type of model you are using. Below an example of how we generate the MobileNetV1_075_SSD.pbtxt with a Raspberry Pi.

Deep learning software for Raspberry Pi