Custom OCR application in C++
Do you want to create own OCR application? It is easy.
Installed tesseract (4) development files. On Windows you can use output of Installing dependencies from my previous blog. On Linux install package like libtesseract-dev (Ubuntu bionic).
Do not forget setup environment. Here are detail for windows:
"c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" x64
SET PATH=%PATH%;f:\win64\bin
SET TESSDATA_PREFIX=f:\Project\tessdata
Create (text) file mytesseract.cpp with following code:
#include <leptonica/allheaders.h> #include <tesseract/baseapi.h>int main(int argc,char* argv[]) { char *outText; if(argc==1) { printf("Program usage:\n\t %s image_filename\n", argv[0]); return 0; } tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// suppress tesseract debug messages api->SetVariable("debug_file", "/dev/null");
// Initialize tesseract-ocr with English, without specifying tessdata path if (api->Init(NULL, "eng")) { fprintf(stderr, "Could not initialize tesseract.\n"); return 1; } // Open input image with leptonica library Pix *image = pixRead(argv[1]); if (image == NULL) { fprintf(stderr, "Could not load image from filename '%s'.\n", argv[1]); return 2; } api->SetImage(image); // Get OCR result outText = api->GetUTF8Text(); printf("OCR output:\n%s", outText); // Destroy used object and release memory api->End(); delete [] outText; pixDestroy(&image); return 0; }
Finally you need to compile it:
cl /EHsc mytesseract.cpp /If:\win64\include /link /LIBPATH:F:/WIN64/LIB tesseract41.lib leptonica-1.81.0.lib /machine:x64 /out:mytesseract.exe
or on Linux:
g++ mytesseract.cpp -o mytesseract -ltesseract -llept
