Custom OCR application in C++
Do you want to create own OCR application? It is easy.
Requirements:
Installed tesseract (4) development files. On Windows you can use output of Installing dependencies from my previous blog. On Linux install package like libtesseract-dev (Ubuntu bionic).
Do not forget setup environment. Here are detail for windows:
"c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" x64
SET PATH=%PATH%;f:\win64\bin
SET TESSDATA_PREFIX=f:\Project\tessdata
Create (text) file mytesseract.cpp with following code:
#include <leptonica/allheaders.h> #include <tesseract/baseapi.h>int main(int argc,char* argv[]) { char *outText; if(argc==1) { printf("Program usage:\n\t %s image_filename\n", argv[0]); return 0; } tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// suppress tesseract debug messages api->SetVariable("debug_file", "/dev/null");
// Initialize tesseract-ocr with English, without specifying tessdata path if (api->Init(NULL, "eng")) { fprintf(stderr, "Could not initialize tesseract.\n"); return 1; } // Open input image with leptonica library Pix *image = pixRead(argv[1]); if (image == NULL) { fprintf(stderr, "Could not load image from filename '%s'.\n", argv[1]); return 2; } api->SetImage(image); // Get OCR result outText = api->GetUTF8Text(); printf("OCR output:\n%s", outText); // Destroy used object and release memory api->End(); delete [] outText; pixDestroy(&image); return 0; }
Finally you need to compile it:
cl /EHsc mytesseract.cpp /If:\win64\include /link /LIBPATH:F:/WIN64/LIB tesseract41.lib leptonica-1.81.0.lib /machine:x64 /out:mytesseract.exe
or on Linux:
g++ mytesseract.cpp -o mytesseract -ltesseract -llept
Comments
Post a Comment