
Install OpenCL on Raspberry Pi 3 B+
Vulkan
Introduction.
First of all, don't confuse OpenCL (a GPU library) with OpenCV (a Computer Vision library). If you were planning to install OpenCV, please follow our instructions on this page.
There is no official OpenCL version for the Raspberry Pi. We use here is the result of the master thesis of Daniel Steadelmann (Doe300 at GitHub). This OpenCL version is written for the Raspberry Pi 3 B+ only and don't support the full OpenCL command set.
If your software requires a full version, like GluonCV, you can consider to install PoCL. On a Raspberry, it will not use the GPU, but it simulates OpenCL by using the CPU. Needless to say, it will hardly accelerate your code.
Showstopper.
First of all, this version works only on a Raspberry Pi 3 B+. Because the GPU on a Raspberry Pi 4 differs greatly from Pi 3, and the lack of a detailed VideoCore VI datasheet, no OpenCL is yet available for the Pi 4. However, recently there is a Vulkan version available for the Raspberry Pi 4. The install guide can be found here.
Secondly, the version only supports a subset of all OpenCL commands. Understandable, given the work, it takes to write a full version.
The consequence of this all is, that the version does not work with OpenCV, in contrast to the MALI version above.
Because the Raspberry Pi uses the same memory chip for the CPU and the GPU, OpenCL code can modify your operating system. You shall need to execute the OpenCL code as root or as superuser (sudo).
Finally, do not expect any miracles from the computing power of the VideoCore IV GPU. In the end, it shall give you about 24 GFLOPS.
Dependencies.
The OpenCL software uses other third-party software libraries. These have to be installed first. Perhaps there are already installed but that doesn't matter. Latest versions are always kept by the installation procedure. In order to compile and run OpenCL based code you need also the LLVM's Clang compiler. The default Raspian GNU compilers (gcc and g++) don't support OpenCL code, only the API calls to the library.
# get a fresh start
$ sudo apt-get update
$ sudo apt-get upgrade
# get third party software
$ sudo apt-get install cmake git
$ sudo apt-get install ocl-icd-opencl-dev ocl-icd-dev
$ sudo apt-get install opencl-headers
$ sudo apt-get install clinfo
$ sudo apt-get install libraspberrypi-dev
# get Clang compiler
$ sudo apt-get install clang clang-format clang-tidy
If everything went well, your screen should look something like this, after the clang --version command.

Now everything is in place, we can start installing the OpenCL version from GitHub. There are three packages which we need. But first make a directory where we can store all the software.
$ mkdir -p ~/opencl
$ cd ~/opencl
$ git clone https://github.com/doe300/VC4CLStdLib.git
$ git clone https://github.com/doe300/VC4CL.git
$ git clone https://github.com/doe300/VC4C.git
All three packages must be built. The procedure is identical for each of them. However, keep an eye on the order. First, the VC4CLStdLib library must be installed, then VC4C and finally VC4CL. They are interdependent.
# first VC4CLStdLib
$ cd ~/opencl/VC4CLStdLib
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig
# next VC4C
$ cd ~/opencl/VC4C
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig
# last VC4CL
$ cd ~/opencl/VC4CL
$ mkdir build
$ cd build
$ cmake ..
$ make
$ sudo make install
$ sudo ldconfig
After all three packages have been successfully installed, you can check the installation with clinfo, as you can see below.
Number of platforms 1
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Platform Vendor doe300
Platform Version OpenCL 1.2 VC4CL 0.4.9999
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
Platform Extensions function suffix VC4CL
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices 1
Device Name VideoCore IV GPU
Device Vendor Broadcom
Device Vendor ID 0xa5c
Device Version OpenCL 1.2 VC4CL 0.4.9999
Driver Version 0.4.9999
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile EMBEDDED_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available No
Max compute units 1
Max clock frequency 500MHz
Core Temperature (Altera) 33 C
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 12x12x12
Max work group size 12
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 16 / 16
long 0 / 0
half 0 / 0 (n/a)
float 16 / 16
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 134217728 (128MiB)
Error Correction support No
Max memory allocation 134217728 (128MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 64 bytes
Alignment of base address 512 bits (64 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768 (32KiB)
Global Memory cache line size 64 bytes
Image support No
Local memory type Global
Local memory size 134217728 (128MiB)
Max number of constant args 64
Max constant buffer size 134217728 (128MiB)
Max size of kernel argument 256
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
IL version SPIR-V_1.2 SPIR_1.2
SPIR versions 1.2
printf() buffer size 0
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_vc4cl_performance_counters
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) OpenCL for the Raspberry Pi VideoCore IV GPU
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [VC4CL]
clCreateContext(NULL, ...) [default] Success [VC4CL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
Deep learning software for Raspberry Pi