Pages: [1]
Author Topic: ofxTesseract OCR  (Read 2741 times)
julapy
sydney

Posts: 330

Gravatar


WWW
ofxTesseract OCR
« on: April 22, 2010, 10:54:31 AM »

hi all,

managed to get tesseract working in OF.
http://code.google.com/p/tesseract-ocr/

the tesseract library is an  OCR engine (optical character recognition)
basically it searches an image for words or text and gives it back to you in a string format.

ive created a ofxTesseract wrapper which is very simple at the moment but should get people started quickly.

download it here,
http://julapy.com/source/tesseractExample.zip

originally i found this source by nolan brown,
http://github.com/nolanbrown/Tesseract-iPhone-Demo
the tesseract library was already compiled in the iphone xcode project and so i figured it would work in OF on a mac, which it did.

it might work better if the library is compiled again, strictly for the mac osx, but this is a bit over my head. be great if someone knew how to do this.

something to be aware of,
in the xcode project -> executables
double click the executable and a window will come up.
in the Arguments tab, i had to add the following,
TESSDATA_PREFIX = "../../../data/"

this tells the library where the "tessdata" folder is located.
so far i haven't been able to figure out any other way of specifying the data path.

L.
Logged

julapy
sydney

Posts: 330

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #1 on: April 27, 2010, 07:37:42 AM »

thought id post something ive been working on using ofxTesseract.



subscan - basically scans for movie subtitles.
not sure where this side project is going... but its been a good exercise in openCV.
also nice to be able to transcribe all that movie data in real-time and have it at hand.
if anyone has any cool ideas on what to do with all this movie data, would love to hear them!
Logged

rtwomey

Posts: 12

Gravatar


Re: ofxTesseract OCR
« Reply #2 on: December 19, 2010, 02:42:22 AM »

Hi,

Thanks for your work on this!  I compiled Tesseract-3.00 for i386 on OS X and have used it in OpenFrameworks.  This update seems to have fixed some of the problems with setting data-paths etc., plus solved a mysterious "tesseract crashes 30% of the time" issue I was having (which may or may not have been a problem with my coding).

Anyway, I've put an updated version of the tesseractExample online here:

http://svn.roberttwomey.com/of/tesseractExample/

Additionally, I have integrated Flamingo, an approximate string-matching library from UC Irvine, to match OCR results against known possibilities.  I will do a separate post under "extend" to cover that.  See this post http://forum.openframeworks.cc/index.php/topic,5206.msg25900.html#msg25900

If you try them out, please let me know how they work for you.

Best,

Robert
Logged

http://roberttwomey.com
Doctoral Student, Center for Digital Arts and Experimental Media
University of Washington, Seattle, WA
rtwomey@u.washington.edu
stephanschulz
Montreal

Posts: 360

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #3 on: December 29, 2010, 03:56:43 PM »

hi

i downloaded the xcode projected (the one with out flamingo). it compiles fine but when trying to run it i get this error message:

ofxTesseract :: loading tessdata from - ../../../data/tessdata
Error openning data file /usr/local/share/tessdata/eng.traineddata

any ideas?

thanks,
stephan.
Logged

osx 10.6.8
OF 007
julapy
sydney

Posts: 330

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #4 on: January 09, 2011, 06:21:50 AM »

hi robert,
awesome that you got Tesseract-3.00 compiling!
i've run your tesseractExample and also getting the same error as stephan.
could there be step i missed somewhere?
Logged

kylemcdonald
View admin
Brooklyn

Posts: 1141

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #5 on: January 27, 2011, 04:36:54 AM »

i'm also curious to get the latest tesseract working with OF, any leads would be very welcome!
Logged

superscam

Posts: 17

Gravatar


Re: ofxTesseract OCR
« Reply #6 on: February 14, 2011, 08:06:06 PM »

Hey guys!

OCR in OF sounds ace! I couldn't get any of these examples to work though...the first one seems to compile but then disappears straight away with a "emptyExample has exited with status 1" comment in the message bar at the bottom of xcode.  The second download link seems to fail with these 3 errors of main.cpp listed as:

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: expected type-specifier before 'testApp'

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: expected `)' before 'testApp'

/Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13:0 /Users/SCam/Documents/of_preRelease_v0062_osxSL_FAT/apps/addonsExamples/tesseractV2download/src/main.cpp:13: error: cannot convert 'int*' to 'ofBaseApp*' for argument '1' to 'void ofRunApp(ofBaseApp*)'

Any pointers as to where I'm going wrong?...I'd love to have a play with this!

Cheers, sCam
Logged
kylemcdonald
View admin
Brooklyn

Posts: 1141

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #7 on: March 13, 2011, 06:06:03 PM »

just tried the first example, and it works fine for me with minor tweaking.

i'm using it with OF-github and started with the opencv example.

then i swapped out the source and added ofxTesseract.

then i got the error "emptyExample has exited with status 1".

to fix this, i had to create a directory at /usr/local/ called 'share', then one called 'tessdata' inside share/. then i moved everything from bin/data/tessdata/ into /usr/local/share/tessdata/ and everything worked.

now that i can see it's working i'm going to try getting the latest tesseract compiled and try that instead. i'd rather not have to create that /usr/local/share/tessdata folder.
Logged

kylemcdonald
View admin
Brooklyn

Posts: 1141

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #8 on: March 14, 2011, 04:52:28 AM »

@stephanschulz, @julapy -- there seems to still be some kind of error with tesseract recognizing the path you give it.

but i looked through the tesseract source and noticed it has an env variable you can set to override everything else (this is in mainblk.cpp at void CCUtil::main_setup).

so if you have your data in bin/data/tessdata, you can just say:

Code:
	string tessdataPath = ofToDataPath("", true);
setenv("TESSDATA_PREFIX", tessdataPath.c_str(), 1);

and that will force tesseract to use the right location. kind of a hack, but i'm not sure why it's broken right now... the secret might be somewhere in the tesseract getpath() function, but it's a bit messy and hard to decipher.
Logged

stephanschulz
Montreal

Posts: 360

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #9 on: March 14, 2011, 02:24:47 PM »

thanks for this information. i will give it a try.

stephan.
Logged

osx 10.6.8
OF 007
kylemcdonald
View admin
Brooklyn

Posts: 1141

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #10 on: March 14, 2011, 11:41:46 PM »

i just rewrote ofxTesseract to expose some helpful options and clean up the code a bit. it's on github: https://github.com/kylemcdonald/ofxTesseract

i'm running against of/github so it won't work in 0062.
Logged

stephanschulz
Montreal

Posts: 360

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #11 on: March 30, 2011, 02:54:34 PM »

thanks for posting it.
so far i have not be able to run it though.

i get errors connected to ofxAutoControlPanel

error: 'setXMLFilename' was not declared in this scope
error: 'class ofxAutoControlPanel' has no member named 'hasValueChanged'

i will do some more searching to figure out why this happens. but maybe you already know.

thanks,
stephan.
Logged

osx 10.6.8
OF 007
stephanschulz
Montreal

Posts: 360

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #12 on: March 30, 2011, 03:17:42 PM »

ok i think i got it now.

i used the latest OF version from the github. added ofxControlPanel from here https://github.com/ofTheo/ofxControlPanel

compiled as release and now it works.

s.
Logged

osx 10.6.8
OF 007
lukasz
Grand Rapids, MI

Posts: 39

Gravatar


WWW
Re: ofxTesseract OCR
« Reply #13 on: April 29, 2012, 04:11:33 PM »

I seem to be having same problem, I'm running latest version however and still no luck. I don't see a ofxAutoControlPanel.h in the library, maybe I am missing something.
« Last Edit: April 29, 2012, 07:31:45 PM by lukasz » Logged

Pages: [1]
 
Jump to:  

Powered by SMF 1.1.15 | SMF © 2011, Simple Machines

viagra priser