Hi oF! This is a short primer on the Accelerate framework, what it is, why you would use it, and how to do so.

**What is it?**

Accelerate is a collection of functions and objects you can use to get a big speed boost when you’re working with sets of data (arrays, vectors, etc). You can think of it as a collection of pre-made ‘for’ loops that are built to make the most out of your CPU (if “SIMD” means anything to you, Accelerate makes use of SSE on Intel processors and NEON on iOS). These functions do all manner of standard arithmetic operations (“multiply everything by 5”), analysis (“what’s the biggest value?”), utility (sorting, absolute values, casting…) and your standard DSP stuff (FFT, BiQuad filtering, correlation, convolution…).

**Why / When?**

Accelerate fills the gap between your everyday ‘for’ loop and more involved GPU-based techniques like OpenCL. Working with audio is the most ideal scenario in my experience (i.e. it never needs to touch the GPU, there’s usually a lot of data flying around, and it needs to be done ASAP). That said, any time you’ve got to make some numbers happen, it’s probably worth it to consider using Accelerate.

Another reason to use Accelerate is if you’re doing a fair amount of calculations on an iOS device, since it takes power usage into account. As in, your app will typically use less battery if you do your calculations with Accelerate.

When NOT to use Accelerate is if you’re working with data that’s already on the GPU (e.g. textures). In that case you’ll probably be much better served by a shader. A hefty portion of Accelerate is the vImage library, which I personally haven’t found too much use for because of this. This guide focuses entirely on the vDSP part of Accelerate.

**How?**

First, add Accelerate.framework to your project. Then, `#include <Accelerate/Accelerate.h>`

in files where you want to make use of it.

Here’s an example of an Accelerate function. This will multiply everything in the array “input” by 5 and store the results in the array “output”:

```
const int array_size = 3;
float input[array_size] = {1,2,3};
float output[array_size];
float factor = 5;
vDSP_vsmul(input, 1, &factor, output, 1, array_size);
```

So what’s worth noticing here?

- Accelerate’s function names look like a particularly nerdy cat walked across your keyboard.
- Accelerate typically doesn’t work “in place”. Meaning, operations usually take values from one array, work on them, then store the results in another. (EDIT: It seems you can actually use the same array as both input and output, though the Accelerate docs don’t seem to mention this explicitly).
- What’s up with those 1s in the function call?

First, the function names. There are two terms in particular that you’ll need to know to work with Accelerate:

scalar = one value

vector = one or more values

Note that Accelerate’s definition of “vector” is not the same as C++'s std::vector (though a std::vector can work as an Accelerate vector). As far as Accelerate is concerned, a “vector” is just a pointer to some values.

Breaking the function name `vDSP_vsmul`

into parts, you get:

- “vDSP” = This function is part of the “vDSP” section of Accelerate
- “v” = it works on vectors
- “s” = it uses a scalar
- “mul” = it does multiplication

Knowing this, it stands to reason that `vDSP_vsadd`

and `vDSP_vsdiv`

are the same as `vDSP_vsmul`

, but do addition and division respectively.

A couple more characters Accelerate uses:

- D = works on doubles (as opposed to floats). Ex.
`vDSP_vsmulD`

- i = works on ints. Ex.
`vDSP_vsaddi`

Secondly, Accelerate typically operates “out of place”. This means that just about every Accelerate function will take one or more inputs, and store the results somewhere else. When Accelerate requests an “input vector” or an “output vector” you can use one of a few things. Namely:

- arrays
- a
`std::vector`

(to get the pointer Accelerate wants, use`&myVector[0]`

) - some other chunk of contiguous data (e.g. ofPixels)

Thirdly, what’s with those extra 1s in the `vDSP_vsmul`

call above? In case you forgot, it looked like this :

```
vDSP_vsmul(input, 1, &factor, output, 1, array_size);
```

The 1s here represent the “stride” of the data in the array. A stride of 1 means that all of the values are right next to each other in memory. In ASCII terms, this:

`[ v ][ v ][ v ][ v ][ v ][ v ][ v ]`

Where `v`

represents a value. Why does Accelerate bother with this? Well, this lets you extract certain values from a dataset that uses a more complex packing scheme. For example, an ofPixels object can represent RGB data like this in memory:

`[ r ][ g ][ b ][ r ][ g ][ b ][ r ][ g ][ b ]`

If you wanted to just operate on the red pixels, for example, you could pass in a stride of 3. This would mean “work on every 3rd value”.

I’ll demonstrate some more interesting uses of Accelerate in a bit. But first! About that “speed” thing.

Here’s the results of a simple test I ran on my laptop (MacBook pro, mid 2009, Core 2 Duo, 2.66GHz), using the “Release” Xcode build scheme. The test loads some random values into an array, finds the max value, then divides the entire array by the max value to map it from 0 to 1. The times are in seconds (i.e. 1.0 would be one second).

```
Values: 220500 For loop: 0.0015861 Accelerate: 0.000147832
Values: 441000 For loop: 0.0032297 Accelerate: 0.000890234
Values: 661500 For loop: 0.00487999 Accelerate: 0.00182216
Values: 882000 For loop: 0.00656921 Accelerate: 0.00307542
Values: 1102500 For loop: 0.00826815 Accelerate: 0.0034329
Values: 1323000 For loop: 0.014751 Accelerate: 0.00446377
Values: 1543500 For loop: 0.0115801 Accelerate: 0.00548144
Values: 1764000 For loop: 0.0130506 Accelerate: 0.00529263
Values: 1984500 For loop: 0.0146551 Accelerate: 0.00612535
Values: 2205000 For loop: 0.016566 Accelerate: 0.00694328
Values: 2425500 For loop: 0.0183522 Accelerate: 0.00821662
Values: 2646000 For loop: 0.0202095 Accelerate: 0.00862035
Values: 2866500 For loop: 0.0211966 Accelerate: 0.00892981
Values: 3087000 For loop: 0.0236832 Accelerate: 0.00998177
```

That’s a little over twice as fast with Accelerate. More complex operations with bigger data sets will show more of a benefit for Accelerate.

So then, some more complex operations.

These examples assume that there are 3 float arrays that exist already, called A, B and C. A, B and C could also be std::vectors, in which case they are used as `&A[0]`

instead of just passing them into the functions directly. These examples also assume that there is a variable called `data_size`

, which represents the size of the arrays (in terms of elements, so a `float[5]`

would have a `data_size`

of 5).

Here’s the max value & divide functions I used for the test above:

```
float max_value;
// store the maximum value from array A in variable max_value
vDSP_maxv(A, 1, &max_value, data_size);
// divide each value in array A by max_value, store the results
// in array B. (A is unchanged after this)
vDSP_vsdiv(A, 1, &max_value, B, 1, data_size);
```

This adds each value in A to the value in B with the same index, then stores the results in C. For example, if A = `[2, 5]`

and B = `[3, 10]`

, this would make C = `[5,15]`

.

```
vDSP_vadd(A, 1, B, 1, C, 1, data_size);
```

This generates an array which ramps from one value to another. For example, if `data_size`

is 5, this would fill A with `[5, 7.5, 10, 12.5, 15]`

.

```
float start = 5;
float end = 15;
vDSP_vgen(&start, &end, A, 1, data_size);
```

This calculates the average (mean) of the values in A.

```
float mean;
vDSP_meanv(A, 1, &mean, data_size);
```

This stores clamped values from A into B. For example, if A = `[1,2,3,4,5]`

, this would make B = `[2,2,3,4,4]`

.

```
float min = 2;
float max = 4;
vDSP_vclip(A, 1, &min, &max, B, 1, data_size);
```

This calculates the FFT for an audio signal stored in A. This assumes A is a buffer holding 1024 audio samples. FFTs with Accelerate should typically be for data sets that are a power of 2 (i.e 256, 512, 1024, 2048…).

```
// Setup -------------
// You should do this once, and keep these variables for subsequent FFTs.
UInt32 log2N = 10; // 1024 samples
UInt32 N = (1 << log2N);
FFTSetup FFTSettings = vDSP_create_fftsetup(log2N, kFFTRadix2);
COMPLEX_SPLIT FFTData;
FFTData.realp = (float *) malloc(sizeof(float) * N/2);
FFTData.imagp = (float *) malloc(sizeof(float) * N/2);
float * hammingWindow = (float *) malloc(sizeof(float) * N);
// create an array of floats to represent a hamming window
vDSP_hamm_window(hammingWindow, N, 0);
// FFT Time ----------
// Moving data from A to B via hamming window
vDSP_vmul(A, 1, hammingWindow, 1, B, 1, N);
// Converting data in B into split complex form
// http://en.wikipedia.org/wiki/Split-complex-number
vDSP_ctoz((COMPLEX *) B, 2, &FFTData, 1, N/2);
// Doing the FFT
vDSP_fft_zrip(FFTSettings, &FFTData, 1, log2N, kFFTDirection_Forward);
// calculating square of magnitude for each value
vDSP_zvmags(&FFTData, 1, FFTData.realp, 1, N/2);
// At this point, FFTData.realp is an array of 512 FFT values (1024/2).
// Cleanup -----------
// You should do this only when you're done doing FFTs.
vDSP_destroy_fftsetup(FFTSettings);
free(FFTOutput.realp);
free(FFTOutput.imagp);
free(hammingWindow);
```

Here’s rudimentary pitch detection. This is an addendum to the previous FFT example. It picks up after the `vDSP_zvmags`

call above (before cleanup). Note that if you want serious pitch detection, you’ll have to delve into the DSP world a bit more.

```
// Doing an inverse FFT. (FFT -> magnitude squared -> IFFT = autocorrelation, sort of)
vDSP_fft_zrip(FFTSettings, &FFTData, 1, log2N, kFFTDirection_Inverse);
// Storing the autocorrelation results in B
vDSP_ztoc(&FFTData, 1, (COMPLEX *)B, 2, N/2);
// Calculating the zero-crossings in B. A "zero-crossing" is when a
// signal goes from above 0 to below (or vice versa). Since the autocorrelation
// results stored in B will be a signal from -1 to 1, this will provide
// a rudimentary pitch detection for the signal. Emphasis on rudimentary.
vDSP_Length lastZeroCrosssing;
vDSP_Length zeroCrossingCount;
vDSP_nzcros(B, 1, N, &lastZeroCrossing, &zeroCrossingCount, N);
// At this point zeroCrossingCount will be an int representing double the
// pitch of the signal (2 zero crossings = 1 oscillation)
```

The vDSP reference and list of functions can be found here

Shouts to Golan Levin and Kyle McDonald for their DSP workshop at Eyeo.