ofxTesseract OCR

hi all,

managed to get tesseract working in OF.
http://code.google.com/p/tesseract-ocr/

the tesseract library is an OCR engine (optical character recognition)
basically it searches an image for words or text and gives it back to you in a string format.

ive created a ofxTesseract wrapper which is very simple at the moment but should get people started quickly.

download it here,
http://julapy.com/source/tesseractExample.zip

originally i found this source by nolan brown,
http://github.com/nolanbrown/Tesseract-iPhone-Demo
the tesseract library was already compiled in the iphone xcode project and so i figured it would work in OF on a mac, which it did.

it might work better if the library is compiled again, strictly for the mac osx, but this is a bit over my head. be great if someone knew how to do this.

something to be aware of,
in the xcode project -> executables
double click the executable and a window will come up.
in the Arguments tab, i had to add the following,
TESSDATA_PREFIX = “…/…/…/data/”

this tells the library where the “tessdata” folder is located.
so far i haven’t been able to figure out any other way of specifying the data path.

L.

thought id post something ive been working on using ofxTesseract.

http://vimeo.com/11255515

subscan - basically scans for movie subtitles.
not sure where this side project is going… but its been a good exercise in openCV.
also nice to be able to transcribe all that movie data in real-time and have it at hand.
if anyone has any cool ideas on what to do with all this movie data, would love to hear them!

Hi,

Thanks for your work on this! I compiled Tesseract-3.00 for i386 on OS X and have used it in OpenFrameworks. This update seems to have fixed some of the problems with setting data-paths etc., plus solved a mysterious “tesseract crashes 30% of the time” issue I was having (which may or may not have been a problem with my coding).

Anyway, I’ve put an updated version of the tesseractExample online here:

http://svn.roberttwomey.com/of/tesseractExample/

Additionally, I have integrated Flamingo, an approximate string-matching library from UC Irvine, to match OCR results against known possibilities. I will do a separate post under “extend” to cover that. See this post http://forum.openframeworks.cc/t/tesseract-ocr-and-flamingo-validation–approximate-matching/5206/1

If you try them out, please let me know how they work for you.

Best,

Robert

hi

i downloaded the xcode projected (the one with out flamingo). it compiles fine but when trying to run it i get this error message:

ofxTesseract :: loading tessdata from - …/…/…/data/tessdata
Error openning data file /usr/local/share/tessdata/eng.traineddata

any ideas?

thanks,
stephan.

hi robert,
awesome that you got Tesseract-3.00 compiling!
i’ve run your tesseractExample and also getting the same error as stephan.
could there be step i missed somewhere?

i’m also curious to get the latest tesseract working with OF, any leads would be very welcome!

Hey guys!

OCR in OF sounds ace! I couldn’t get any of these examples to work though…the first one seems to compile but then disappears straight away with a “emptyExample has exited with status 1” comment in the message bar at the bottom of xcode. The second download link seems to fail with these 3 errors of main.cpp listed as:

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: expected type-specifier before ‘testApp’

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: expected `)’ before ‘testApp’

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: cannot convert ‘int*’ to ‘ofBaseApp*’ for argument ‘1’ to ‘void ofRunApp(ofBaseApp*)’

Any pointers as to where I’m going wrong?..I’d love to have a play with this!

Cheers, sCam

just tried the first example, and it works fine for me with minor tweaking.

i’m using it with OF-github and started with the opencv example.

then i swapped out the source and added ofxTesseract.

then i got the error “emptyExample has exited with status 1”.

to fix this, i had to create a directory at /usr/local/ called ‘share’, then one called ‘tessdata’ inside share/. then i moved everything from bin/data/tessdata/ into /usr/local/share/tessdata/ and everything worked.

now that i can see it’s working i’m going to try getting the latest tesseract compiled and try that instead. i’d rather not have to create that /usr/local/share/tessdata folder.

@stephanschulz, @julapy – there seems to still be some kind of error with tesseract recognizing the path you give it.

but i looked through the tesseract source and noticed it has an env variable you can set to override everything else (this is in mainblk.cpp at void CCUtil::main_setup).

so if you have your data in bin/data/tessdata, you can just say:

  
  
	string tessdataPath = ofToDataPath("", true);  
	setenv("TESSDATA_PREFIX", tessdataPath.c_str(), 1);  
  

and that will force tesseract to use the right location. kind of a hack, but i’m not sure why it’s broken right now… the secret might be somewhere in the tesseract getpath() function, but it’s a bit messy and hard to decipher.

thanks for this information. i will give it a try.

stephan.

i just rewrote ofxTesseract to expose some helpful options and clean up the code a bit. it’s on github: https://github.com/kylemcdonald/ofxTesseract

i’m running against of/github so it won’t work in 0062.

thanks for posting it.
so far i have not be able to run it though.

i get errors connected to ofxAutoControlPanel

error: ‘setXMLFilename’ was not declared in this scope
error: ‘class ofxAutoControlPanel’ has no member named ‘hasValueChanged’

i will do some more searching to figure out why this happens. but maybe you already know.

thanks,
stephan.

ok i think i got it now.

i used the latest OF version from the github. added ofxControlPanel from here https://github.com/ofTheo/ofxControlPanel

compiled as release and now it works.

s.

I seem to be having same problem, I’m running latest version however and still no luck. I don’t see a ofxAutoControlPanel.h in the library, maybe I am missing something.

i know it’s probably way too late to respond to this, but it’s because i’m using my fork of ofxControlPanel available here https://github.com/kylemcdonald/ofxControlPanel