I have written a simple speech recognition and synthesis lib for oF. It only works on the mac now (as it is a wrapper for the Carbon speech APIs in OS X), but I would like to port it to Windows and Linux if possible.

It’s still rough around the edges (I’ve to relearn quite a bit of C++ and learn the way Carbon does things, which is a bit odd sometimes), but I will be polishing up as I go. Still a work in progress, I have a txt file in the doc folder that roughly explains how to use it, but more thorough documentation will be there soon.

I have put the code here:
(if you dont use git, you can still download a zip file or a tarball)

There’s a short video here:



wo nice man congratulations i will tried for windows…

:slight_smile: take care… and congratulations again…

wow, looking really cool! there is espeakup on linux which might integrate easily.

Hi :smiley: this is so cool! I love it! Thanks a lot for this!
greetings ascorbin


this is really cool… congratulations.

i had a quick play and it was working sometimes for some commands like ‘red’ but had trouble with others.

maybe its having trouble recoginising my austrlian accent. :slight_smile:

is there something else i can to configure speach recognition settings in the OS perhaps?


hm, it doesn’t seem like espeakup is doing recognition.
However, several speech-recognition programs are listet here:
Don’t know which one is the best yet.

I like this, it’s pretty simple to use, i.e. plugging a new dictionary in is pretty easy. I’m not sure what people would do on Linux. I’ve been using Julius for some things and it’s pretty efficient if given a good dictionary to work with and can use grammar files as well, which is pretty neat. Training is do-able, which, I could never get that to work with Sphinx. I can’t confirm this but from what I’ve heard the speech recognition stuff in Windows 7 is pretty good, so tying into that would be neat.

Well, good luck to you trying to tie anything Windows into OF. It probably wouldn’t work with Code::Blocks for Windows (well, actually just GCC/MinGW) because chances are it’s a .NET-compiled library (which is actually M$IL binary code instead of system-level binary) and therefore is annoyingly unable to cross-compile with GCC/system-level binaries or code.

In other words, Window$ fails us again.

I tried to implement this in Snow Leopard using a small dictionary , “one”, “two”, “three”.

My results are very inconsistant, in a print out I get a lot of junk and infrequently it responds correctly. Any ideas about why this might be happening? I’m wondering what version of the OS you were using…

Language Model Was Loaded
word is: |???\277\300EB|
word is: |???\277\300EB|
word is: |???\277\300EB|
word is: |one\277\300EB|
word is: |???\277\300EB|
word is: |threeEB|
word is: |???\277\300EB|
word is: |???\277\300EB|
word is: |???\277\300EB|
word is: |???\277\300EB|


After getting an external mic and playing a lot with the vocabulary I’m pretty disappointed with the default speech recognition - you can use it with about 60-80% accuracy for a limited vocabulary…

Some tips - open your system preferences - go to speech and voice and calibrate your mic and speech recognition.

Hi seh4b,

ofxSpeech has indeed some limitations, I initially wrote it for a project that ran on OS X 10.5. I put it up figuring others may find it useful. The lib is written using the old Carbon Speech Manager instead of the Cocoa facilities (NSSpeechRecognizer), which didn’t require having to write a c++ wrapper around an obj-c class.

Dhruv Adhia found the best settings for 10.5 to be the following:

In the Speech Recognition prefpane:

  • set speakable items is off
  • set listen continously with keyword (but leave keyword field blank)
  • calibrate the microphone so that it doesn’t cross the green zone (the middle seems to work quite well)

Then in the Sound prefpane:

  • Set the input volume to max
  • check “Use ambient noise reduction” (which should be by default)

All that said, 10.6 brings along issues with the strings generated by the recognition callback. I believe this may be caused by the way Carbon support works in Snow Leopard or the way I am handling the strings returned by the recognition callback (maybe even a combination of the two). However, this won’t do much for the accuracy of the recognition after implementing the settings mentioned above.

I’ll look into the string issue now that I’m running 10.6 and report back with any progress made in terms of the output the library delivers.



i’ve been trying to implement ofxSpeech for a projecto i did using the old carbon libs and having the same string problem… has anyone solved this?
any pointers?



All of you having the ///??? skfjdhsf sneeze etc…
Substitute your void cleanUpString(std::string &stringToClean){} with the code below.
This cleans anything that is not a letter from your result.
Latrokles’s voice recognition example:[-https://github.com/latrokles/ofxSpeech/tree/master/examples/speech-recognition](https://github.com/latrokles/ofxSpeech/tree/master/examples/speech_recognition) works better with else if statements rather than ifs all the way through on draw…

// this is the edited version for the ofxSpeechRecognizer.cpp  
void cleanUpString(std::string &stringToClean)  
        // A = 65, Z = 90  
        // a = 97, z = 122  
        int firstCharacter      = stringToClean.find_first_not_of(" ");  
        int lastCharacter = 100;  
        for (int i = 0; i < stringToClean.size(); i++) {  
            char character = stringToClean.at(i);  
//            cout << character + 0 << endl;  
            if (!((character > 64 && character < 91) ||  
                (character > 96 && character < 123))) {  
                lastCharacter = i;  
        std::string tempString  = stringToClean;  
        stringToClean           = tempString.substr(firstCharacter, (lastCharacter-firstCharacter));  

Hope it helps!