Perform OCR on an image of typed text
You can use tesseract , an OCR command-line program, to convert an image of typed text to plain text ( txt ) like this one: Here is the terminal command to perform OCR on the example image: tesseract image.png stdout --psm 12 --dpi 70 > output.txt which will output the following text: You could also apply the tesseract command (but directing the output to the terminal) to a scanned document like this one: which would output the following text: For more information about the various command line options use tesseract --help or man tesseract . NOTES: tesseract works on linux , macOS , and Windows The default language for tesseract is English but it can recognize more than 100 languages . The default output format for tesseract is text but it can also create a searchable pdf output. You can also apply OCR on tiff images (multipage tiff ) but not pdf documents. References Image sources : Wikipedia contributors. "Optical character recognition." Wikipe