OpenCV and tesseract

Do you need to OCR OpenCV image? No problem with tesseract.

Here is simple example how OCR OpenCV images.


Check if you have installed OpenCV (e.g.  rpm -qa | grep -i opencv in OpenSuse ) or download from official OpenCV site  (e.g. for windows ).

I expect to have installed with tesseract as explain in tesserocr blog

For example we can use images available from Electronic Text Center.

Lets create file opencv_tesseract.cpp with following code:

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <opencv2/opencv.hpp>
#include <opencv2/imgproc.hpp>

int main(int argc,char* argv[]) {

    if(argc==1) {
        printf("Program usage:\n\t %s image_filename\n", argv[0]);
        return 0;
    }

    std::string imPath = argv[1];
    cv::Mat cv_image = cv::imread(imPath, cv::IMREAD_GRAYSCALE);
 
    setMsgSeverity(9);  // turn off leptonica messages
    tesseract::TessBaseAPI* ocr = new tesseract::TessBaseAPI();
    // suppress tesseract debug messages
    ocr->SetVariable("debug_file", "/dev/null");

    if (ocr->Init(NULL, "eng")) {
        std::cout << "Failed to initialize Tesseract." << std::endl;
    } else {
        ocr->SetVariable("user_defined_dpi", "300");
        ocr->SetImage(cv_image.data, cv_image.cols, cv_image.rows, 1, cv_image.cols);

        char* str = ocr->GetUTF8Text();
        std::cout << str << std::endl;

        ocr->Clear();
        ocr->End();

        delete ocr;
        if (str)
            delete[] str;
    }
}

Now setup environment:

"c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat" x64
SET PATH=%PATH%;f:\win64\bin
SET TESSDATA_PREFIX=f:\Project\tessdata

Build your code:

cl /EHsc opencv_tesseract.cpp /If:\win64\include  ^
    /If:\opencv2\opencv\build\include  ^
    /link /LIBPATH:f:/win64/lib  ^
    /LIBPATH:f:\opencv2\opencv\build\x64\vc15\lib\  ^
    tesseract41.lib leptonica-1.81.0.lib opencv_world451.lib ^
    /machine:x64 /out:opencv_tesseract.exe

And run it:

opencv_tesseract.exe robertson.jpg

Comments

Popular posts from this blog

Tesseract LSTM training (aka Makefile training)