Perform OCR on an image of typed text

You can use tesseract, an OCR command-line program, to convert an image of typed text to plain text (txt) like this one:

Here is the terminal command to perform OCR on the example image:

tesseract image.png stdout --psm 12 --dpi 70 > output.txt 

which will output the following text:

You could also apply the tesseract command (but directing the output to the terminal) to a scanned document like this one:

which would output the following text:

For more information about the various command line options use tesseract --help  or  man tesseract.

NOTES:

References

Image sources
  • Wikipedia contributors. "Optical character recognition." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 27 May. 2021. Web. 31 May. 2021.
  • Andrews' Book & Job Printing Office. "The Macon directory for 1860, containing the names of the inhabitants, a business directory and an appendix of much useful information." Washington Memorial Library (Macon, Ga.). 1860, http://dlg.galileo.usg.edu/do:zgy_mcd_dir-macon1860.

Comments

Popular posts from this blog

Install the MAMP (Mac, Apache, MySQL, PHP) stack

Deactivate conda's base environment on startup

Product review: SMONET wireless security camera system