There are several different models for parallel programming. The OpenCL programming interface for GPUs focuses on one of them: the Data Parallel Single Program Multiple Data (SPMD) model. In this model the programmer writes one program or function (called a "kernel" in OpenCL) that is executed multiple times in parallel on the same input data with the portion of the data it should operate on passed to each parallel instance as a parameter.
In the OpenCL model, the separate instances of the kernel to be run are arranged in a 1D, 2D, or 3D rectangular "range". For example, if you want to color a chessboard in one step on 64 processors, you could run the following Bacon kernel:
#define BLACK 0
#define WHITE 1
kernel
Array2D<uchar>
color_chess_board()
{
SETUP:
Array2D<uchar> squares[8, 8];
BODY:
@range [8, 8];
squares[$row, $col] = (($row * 8 + $col) % 2 == 0) ? BLACK : WHITE;
return squares;
}
The basic syntax should be familiar from C-family languages. There are only three funny bits:
When you run the Bacon compiler on this code, it will generate a C++ class that can be linked in to your C++ program. Calling the "color_chess_board" method on an instance of this class will return a Bacon::Array2D object containing the expected data.
The parallelism exposed by the OpenCL programming model (and thus Bacon) is most easily used in cases like this where you want to generate an array and the elements in the array can be computed completely independently either directly from the output address itself or by read-only access to some input data.
If you mutliply two matrices A and B with dimensions (m*p) and (p*n) respectively, you get a third (m*n) matrix C with the value of each entry defined as follows:

Note that the value for each entry in C can be independently calculated from only its position and read-only access to the values in A and B. This makes this problem a perfect candidate for easy implementation in Bacon.
// MatrixMultiply.bc
kernel
Array2D<float>
matrix_multiply(Array2D<float< aa, Array2D<float> bb)
{
SETUP:
Array2D<float> cc[aa.rows, bb.cols];
BODY:
@range [cc.rows, cc.cols];
float sum = 0.0;
for (int kk = 0; kk < aa.cols; ++kk) {
sum += aa[$row, kk] * bb[kk, $col];
}
cc[$row, $col] = sum;
return cc;
}
The body is executed once for each cell in C and the value is computed exactly as shown in the formula above.
This compute kernel can be called from C++ with code like the following:
// mmul_test.cc
#include "gen/MatrixMultiply.hh"
int
main(int argc, char* argv[])
{
MatrixMultiply mm;
Bacon::Array2D aa = generate_test_matrix();
Bacon::Array2D bb = generate_text_matrix();
Bacon::Array2D cc = mm.matrix_multiply(aa, bb);
output_result(cc);
return 0;
}
This project could be compiled using commands like those below. You'll probably want to use a Makefile.
bacon MatrixMultiply.bc (cd gen && make) g++ -c -o mmul_test.o mmul_test.cc `bacon --ccflags` g++ -o mmul_test *.o gen/*.o `bacon --ldflags`
And that's Bacon. Take a look at the examples for more clues.