OCR screen area (with Capture2Text)

- February 05, 2023

If you are annoyed by the current trend of posting text like images on websites - there is a solution (based on tesseract) - Capture2Text.

Capture2Text allows the user to quickly OCR a portion of the screen using a keyboard shortcut. On its sourceforge homepage it offers source code and binary packages for Windows. Since it was built with Qt5, there are already several forks on GitHub that support Linux and macOS. The most advanced seems to be the xiaoyifang fork.

The latest release (4.6.3 released 2022-03-19) comes with support for Tesseract 4.0 so there is support for LSTM and the legacy engine. For reasons unknown, the author is not using the latest version 5.x of Tesseract, so the OCR is a little slower. IMO this could be solved by compiling the source code yourself (perhaps with small necessary code adjustments).

The idea behind Capture2Text is very simple and powerful from a productivity point of view: Select an area on the screen, release the mouse button to start OCR, and put the result on the clipboard (or show it in floating Windows).

For better OCR results, you can select different options such as OCR language and text alignment (vertical/horizontal/auto), scaling factor, Deskew capture, Trim capture, bubble capture (automatic selection of text area of comics bubble under the mouse pointer) and line capture (identify text line area below mouse pointer and OCR it).

For troubleshooting, you can even choose to save the captured image or an enhanced version with coordinates and timestamps.

With Regex find&replace you can even post-process the OCR output. The rules can be defined individually for each language.

It seems that the original intention was to provide support for Japanese manga, so there are 2 additional functions beyond OCR:

Text to Speach - reading of OCR output (it offers installed windows voices, so other OS users will)
Translation of OCR output

So you can use it also from translating foreign words without doing copy&page to translation online engines.

There is also a command line version that can accept screen coordinates for capture, so this could also be a nice help for automating screen tasks.

Capture2Text seems to ignore the environment variable TESSDATA_PREFIX and only considers languages that are in its tessdata. Also, it only accepts only traineddata files with language code, so if you do not name your language variants with your own names like eng_sourcecode.traineddata etc.

ramblings

OCR screen area (with Capture2Text)

Comments

Post a Comment

Popular posts from this blog

Building tesserocr on MS Windows 64bit

OpenCV and tesseract

Cross Compile Tesseract For Android On Windows 10

Preparing Windows for Tesseract "Makefile training" (LSTM training)

Flask: Drag & Drop + Click & Select example | single page app

python with CUDA/GPU support on Windows

OCR pdf file in python on the fly

Custom OCR application in C++

Tesseract LSTM training (aka Makefile training)

Visualize Tesseract Box File