Convert e-books to txt

If you want to convert e-books to plain text (txt) on linux or other Unix-like systems, here are some command-line utilities that you can use along with their terminal commands:

File type Text conversion utilities Examples
.djvu 1. djvutxt
2. ebook-convert
1. djvutxt input.djvu output.txt
2. ebook-convert input.djvu output.txt
.epub 1. epub2txt
2. ebook-convert
3. unzip
1. epub2txt input.epub > output.txt
2. ebook-convert input.epub output.txt
3. 
unzip -c input.epub > output.txt
.doc 1. catdoc
2. textutil (macOS)
3. ebook-convert
1. catdoc input.doc > output.txt
2. textutil -convert txt input.doc -output output.txt
3. ebook-convert input.doc output.txt
.pdf1. pdftotext
2. ebook-convert
1. pdftotext input.pdf output.txt
2. ebook-convert input.pdf output.txt

NOTES:
  • There is a caveat for unzip: the generated output file will also include HTML tags since epubs are zipped HTML files.  That's why I put it in 3rd position in case that you want a quick and dirty solution.
  • ebook-convert is a utility from the e-book library manager calibre that can support many other e-book formats for text conversion.
  • textutil can take as input files:  txt, html, rtf, rtfd, doc, wordml, or webarchive


Image source: cheeseisdisgusting, CC BY-SA 3.0, via Wikimedia Commons

Comments

Popular posts from this blog

Install the MAMP (Mac, Apache, MySQL, PHP) stack

Deactivate conda's base environment on startup

Product review: SMONET wireless security camera system