Custom OCR application in C++

 Do you want to create own OCR application? It is easy.


Requirements:

Installed tesseract (4) development files. On Windows you can use output of Installing dependencies from my previous blog. On Linux install package like  libtesseract-dev (Ubuntu bionic).

Do not forget setup environment. Here are detail for windows:

"c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" x64
SET PATH=%PATH%;f:\win64\bin
SET TESSDATA_PREFIX=f:\Project\tessdata



Create (text) file mytesseract.cpp with following code:

#include <leptonica/allheaders.h>
#include <tesseract/baseapi.h> 


int main(int argc,char* argv[]) {

    char *outText;

    if(argc==1) {
        printf("Program usage:\n\t %s image_filename\n", argv[0]);
        return 0;
    }

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();

    // suppress tesseract debug messages
    api->SetVariable("debug_file", "/dev/null");
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        return 1;
    }

    // Open input image with leptonica library
    Pix *image = pixRead(argv[1]);
    if (image == NULL) {
        fprintf(stderr, "Could not load image from filename '%s'.\n", argv[1]);
        return 2;
    }
    api->SetImage(image);
    // Get OCR result
    outText = api->GetUTF8Text();
    printf("OCR output:\n%s", outText);

    // Destroy used object and release memory
    api->End();
    delete [] outText;
    pixDestroy(&image);

    return 0;
}
  


Finally you need to compile it:

cl /EHsc mytesseract.cpp /If:\win64\include /link /LIBPATH:F:/WIN64/LIB tesseract41.lib leptonica-1.81.0.lib  /machine:x64 /out:mytesseract.exe

or on Linux:

 g++ mytesseract.cpp -o mytesseract -ltesseract -llept



Comments

Popular posts from this blog

Tesseract LSTM training (aka Makefile training)