Install OpenCL on Raspberry Pi 3 B+ - Q-engineering
Q-engineering
Q-engineering
Go to content
images/empty-GT_imagea-1-.png
OpenCL on Raspberry Pi

Install OpenCL on Raspberry Pi 3 B+

Vulkan

Introduction.

First of all, don't confuse OpenCL (a GPU library) with OpenCV (a Computer Vision library). If you were planning to install OpenCV, please follow our instructions on this page.
There is no official OpenCL version for the Raspberry Pi. We use here is the result of the master thesis of Daniel Steadelmann (Doe300 at GitHub). This OpenCL version is written for the Raspberry Pi 3 B+ only and don't support the full OpenCL command set.
If your software requires a full version, like GluonCV, you can consider to install PoCL. On a Raspberry, it will not use the GPU, but it simulates OpenCL by using the CPU. Needless to say, it will hardly accelerate your code.
Showstopper.
First of all, this version works only on a Raspberry Pi 3 B+. Because the GPU on a Raspberry Pi 4 differs greatly from Pi 3, and the lack of a detailed VideoCore VI datasheet, no OpenCL is yet available for the Pi 4. However, recently there is a Vulkan version available for the Raspberry Pi 4. The install guide can be found here.
Secondly, the version only supports a subset of all OpenCL commands. Understandable, given the work, it takes to write a full version.
The consequence of this all is, that the version does not work with OpenCV, in contrast to the MALI version above.
Because the Raspberry Pi uses the same memory chip for the CPU and the GPU, OpenCL code can modify your operating system. You shall need to execute the OpenCL code as root or as superuser (sudo).
Finally, do not expect any miracles from the computing power of the VideoCore IV GPU. In the end, it shall give you about 24 GFLOPS.

Dependencies.

The OpenCL software uses other third-party software libraries. These have to be installed first. Perhaps there are already installed but that doesn't matter. Latest versions are always kept by the installation procedure. In order to compile and run OpenCL based code you need also the LLVM's Clang compiler. The default Raspian GNU compilers (gcc and g++) don't support OpenCL code, only the API calls to the library.
# get a fresh start
$ sudo apt-get update
$ sudo apt-get upgrade
# get third party software
$ sudo apt-get install cmake git
$ sudo apt-get install ocl-icd-opencl-dev ocl-icd-dev
$ sudo apt-get install opencl-headers
$ sudo apt-get install clinfo
$ sudo apt-get install libraspberrypi-dev
# get Clang compiler
$ sudo apt-get install clang clang-format clang-tidy
If everything went well, your screen should look something like this, after the clang --version command.
Succesful installation clang
Now everything is in place, we can start installing the OpenCL version from GitHub. There are three packages which we need. But first make a directory where we can store all the software.
$ mkdir -p ~/opencl
$ cd ~/opencl
$ git clone https://github.com/doe300/VC4CLStdLib.git
$ git clone https://github.com/doe300/VC4CL.git
$ git clone https://github.com/doe300/VC4C.git
All three packages must be built. The procedure is identical for each of them. However, keep an eye on the order. First, the VC4CLStdLib library must be installed, then VC4C and finally VC4CL. They are interdependent.
# first VC4CLStdLib
$ cd ~/opencl/VC4CLStdLib
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig

# next VC4C
$ cd ~/opencl/VC4C
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig

# last VC4CL
$ cd ~/opencl/VC4CL
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig
After all three packages have been successfully installed, you can check the installation with clinfo, as you can see below.
Number of platforms                               1
 Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
 Platform Vendor                                 doe300
 Platform Version                                OpenCL 1.2 VC4CL 0.4.9999
 Platform Profile                                EMBEDDED_PROFILE
 Platform Extensions                             cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
 Platform Extensions function suffix             VC4CL

 Platform Name                                   OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices                                 1
 Device Name                                     VideoCore IV GPU
 Device Vendor                                   Broadcom
 Device Vendor ID                                0xa5c
 Device Version                                  OpenCL 1.2 VC4CL 0.4.9999
 Driver Version                                  0.4.9999
 Device OpenCL C Version                         OpenCL C 1.2
 Device Type                                     GPU
 Device Profile                                  EMBEDDED_PROFILE
 Device Available                                Yes
 Compiler Available                              Yes
 Linker Available                                No
 Max compute units                               1
 Max clock frequency                             500MHz
 Core Temperature (Altera)                       33 C
 Device Partition                                (core)
   Max number of sub-devices                     0
   Supported partition types                     None
   Supported affinity domains                    (n/a)
 Max work item dimensions                        3
 Max work item sizes                             12x12x12
 Max work group size                             12
 Preferred work group size multiple              1
 Preferred / native vector sizes                 
   char                                                16 / 16      
   short                                               16 / 16      
   int                                                 16 / 16      
   long                                                 0 / 0       
   half                                                 0 / 0        (n/a)
   float                                               16 / 16      
   double                                               0 / 0        (n/a)
 Half-precision Floating-point support           (n/a)
 Single-precision Floating-point support         (core)
   Denormals                                     No
   Infinity and NANs                             No
   Round to nearest                              No
   Round to zero                                 Yes
   Round to infinity                             No
   IEEE754-2008 fused multiply-add               No
   Support is emulated in software               No
   Correctly-rounded divide and sqrt operations  No
 Double-precision Floating-point support         (n/a)
 Address bits                                    32, Little-Endian
 Global memory size                              134217728 (128MiB)
 Error Correction support                        No
 Max memory allocation                           134217728 (128MiB)
 Unified memory for Host and Device              Yes
 Minimum alignment for any data type             64 bytes
 Alignment of base address                       512 bits (64 bytes)
 Global Memory cache type                        Read/Write
 Global Memory cache size                        32768 (32KiB)
 Global Memory cache line size                   64 bytes
 Image support                                   No
 Local memory type                               Global
 Local memory size                               134217728 (128MiB)
 Max number of constant args                     64
 Max constant buffer size                        134217728 (128MiB)
 Max size of kernel argument                     256
 Queue properties                                
   Out-of-order execution                        No
   Profiling                                     Yes
 Prefer user sync for interop                    Yes
 Profiling timer resolution                      1ns
 Execution capabilities                          
   Run OpenCL kernels                            Yes
   Run native kernels                            No
   IL version                                    SPIR-V_1.2 SPIR_1.2
   SPIR versions                                 1.2
 printf() buffer size                            0
 Built-in kernels                                (n/a)
 Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters

NULL platform behavior
 clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  OpenCL for the Raspberry Pi VideoCore IV GPU
 clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [VC4CL]
 clCreateContext(NULL, ...) [default]            Success [VC4CL]
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
   Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
   Device Name                                   VideoCore IV GPU
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
   Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
   Device Name                                   VideoCore IV GPU
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
   Platform Name                                 OpenCL for the Raspberry Pi VideoCore IV GPU
   Device Name                                   VideoCore IV GPU

ICD loader properties
 ICD loader Name                                 OpenCL ICD Loader
 ICD loader Vendor                               OCL Icd free software
 ICD loader Version                              2.2.12
 ICD loader Profile                              OpenCL 2.2
Deep learning software for Raspberry Pi
Raspberry 64 OS
Raspberry 32 OS
Raspberry and alt
Raspberry Pi 4
Jetson Nano
images/GithubSmall.png
images/YouTubeSmall.png
images/SDcardSmall.png
Back to content