(Note: The Language Hacking Guide has been converted to be read on all devices, using the techniques in this post.)
Thanks to ditching dead trees and embracing 21st century technology I have been reading a lot more lately and really enjoying myself.
I’ve got the fantastic Amazon Kindle (International free 3G version since I’m a traveller, but the $139 wifi version will be fine for most people), and you know what? After over two months, I still haven’t bought any books from the Amazon Kindle store!
Everything I have been reading has been either completely free content or an e-book that I have bought through other sites.
Getting around awkwardly formatted PDFs
Most PDFs you will come across online (for free) are actually just pure text and the Kindle opens these directly and as long as you read it in landscape mode, it renders it fantastically and you can simply click next page and read it very easily.
Unfortunately though, many bloggers who sell PDFs don’t produce them as simple text with some images; instead they format them to have double (or triple) columns on one page, put logos or other useless info in the sides, use a dark colour, bars across the top/bottom etc.
Even for reading on a computer screen I can’t say I like all of his to distract me, but no e-Reader can possibly automatically render this in an easy to read format. It does its best and zooms into a part of the page, but then you have to click up-down left-right to navigate instead of next page and this destroys the reading experience.
All of the graphics also makes them horrible to print out if that’s what you ultimately want to do.
Luckily I found a work-around that I have applied to blogger books I’ve been investing in, as well as to other e-books I have bought recently. In this post I’ll also share a few other sources of information to get free content to read on your eReader (iPad, iPod, Android, Kindle, Sony Reader, Nook, or whatever it may be!)
Step 1: Un-PDF-ifying your book
Adobe’s PDF format is best suited for creating non-editable documents for printing. As such it is not really ideally suited to share information online in complicated formatted documents.
What we need to do is take just the text (and images within the body for illustration) and use that to create our ePUB or Mobi.
You can try simply selecting the text, and copying and pasting it, but most PDF software doesn’t let you do this for the whole document.
I’ve tried a few ways to get something useful out of PDFs via conversion, including conversion using the professional version of Adobe Acrobat, and I’m still not entirely satisfied with the result for eReader conversion (there are line breaks in unideal places). It turns out the best way to get that text is to use OCR software on the PDF as this also incorporates a step for wrapping text properly.
I like ABBYY Finereader, and had to buy it to work with PDF documents as a translator, so I still have it on my Virtual Windows box. I have also used this to convert scanned and photographed text for reading and it has excellent recognition capabilities, in several languages.
I’m hoping that soon there will be good open source (free) alternative OCR software. Google Docs has a free PDF OCR conversion option, but didn’t work for all of the files I sent it. Otherwise, you will find lots of PDF to DOC (or otherwise) free converters online, whose quality varies. Calibre, mentioned below also works for PDF conversion.
The layout of the result may not be great depending on the source and the software, which is why I like to go through step 2 to make sure it is perfect for final conversion.
The goal is to ultimately open that text in Word Processing software, the best by far being (free) Open Office.
Step 2: Convert to HTML
This step may surprise you, but the format you ultimately want your book to be in before ePub/mobi conversion is basic html. Don’t worry, no coding is required.
Now, just go into Open Office’s (or other) Word processor and paste in the text of the entire book, and click “save as” and select HTML. That’s it! Once you have the text pasted in, you can tweak it to make sure it looks good (removing any page numbers / footers / inappropriate line breaks / bad formatting etc.)
When I was writing the Language Hacking Guide I did this directly from the text in the Word Processor, rather than conversion from a PDF of course. You can also paste in content from websites, or other sources to convert them into an ebook.
HTML allows you to include images, and lots of different types of formatting. If you just want a simple book that you don’t need to navigate, then go directly on to the next step.
If you want to make sure the book has a table of contents, then spend just a couple of minutes setting each of the chapter titles to be formatted as header 1 or 2. Select the title and set this in your Word processor (Top-left in OO’s Word processor). You don’t need to do anything other than that for this step.
Step 3: Use Calibre to create the ePub/mobi
The most crucial tool in this whole process is the free application, Calibre. Download and install it (free). It’s also very useful for reading the daily news for free as explained below.
ePub is based on HTML, and this is why pre-conversion to that format already is useful. It’s also a convenient format for Mobi and easy to manipulate in a Word Processor to edit out any problems before final conversion. Once you have your book in HTML format, the conversion process is very easy.
- Click “Add books” (top-left book symbol) and select the HTML file you have just created
- Select that book from the list and click “Convert books” (the third symbol in the top-left).
- In the first step of the conversion process you can put in the author, book name, and upload a picture to represent as the front page of the book if you wish.
- In the top-right, change the output file format to ePUB (most e-Readers) or MOBI (Amazon Kindle). Next click “Page setup” and select your reading device from the list so that it converts best for that screen size. Usually for ePUB the default is fine, and of course for MOBI select the Kindle.
- If you would like to add a “Table of Contents”, click this on the left and then click the magic wand beside “Level 1″ and select “h1″ from the drop down menu. You can leave the rest blank. If you have two levels (my Guide for example has sections as h1 and chapters as h2), set this for level 2.
- If converting to ePub, click “ePUB output” and select “Do not split on page breaks”.
- Click OK, and your ePub or Mobi file will be added to Calibre’s library directory! You can then add the file to your eReader by dragging it over directly, but Calibre detects most readers when they are connected to your computer and you can send it without needing to leave the software.
That’s it! Now you can read the book that has been converted especially for your device!
You can also go through exactly the same steps to convert an ePub to Mobi for the Kindle, by importing that ePub instead of the HTML file.
Reading international news in several languages, your favourite blogs and long articles on your eReader
Of course, I don’t just copy converted PDFs to my Kindle. I save webpages of interesting (but long) articles to read on it, and read the news daily in several languages, without paying anything.
To do this, still within Calibre, click the arrow beside “Fetch News” (the orange down arrow with “N” in it) and select “Add a custom news source”. You will need to add the RSS feed to “Feed URL” here.
I made this video to explain what RSS is for those unsure. Look for that symbol on various websites, or search for “RSS” or “feed” on the page to find it. I’ve added the feed of Le Monde, El País etc. as well as one or two blogs I like that tend to have long articles.
I also really like to use the Read It Later plugin when browsing and click a small icon in Firefox when I come to a long interesting article on my PC. It will be then downloaded for me to read on my Kindle offline by using the feed as shown here next time I connect to Calibre. (Note, Instapaper also have a plugin and you can right-click to save a page, and then on your Instapaper account download the mobi file directly).
So, there you have it! As you can imagine all of these PDFs I’ve bought, and all the interesting articles I mark to read later, as well as daily news means I have no shortage of things to read! Since a lot of that content is in foreign languages, it helps a lot for my language missions to have something to read for practice, while not worrying about having lights shining into my eyes as I would with a standard computer screen thanks to e-Ink technology.
One reason people tell me they prefer other readers over the cheaper and crisper Kindle is because of being “tied” to the Amazon store. As you can see, that’s not quite the case for me at all
Let me know if you’ve tried something similar for your eReaders, or if you think you’ll get one yourself for Christmas! What else do you do to access import stuff to read? Let me know in the comments! (Note: For the first time in the history of the blog, I have had to disable comments on this post, because all new ones were just from spammy SEO companies wanting to promote their expensive PDF conversion tools. Sorry to those of you with real comments, but it was getting too much to deal with!)
Enter your email in the top right of the site to subscribe to the Language Hacking League e-mail list for way more tips sent directly to your inbox!
If you enjoyed this post, you will love my TEDx talk! You can get much better details of how I recommend learning a language if you watch it here.
This article was written by Benny Lewis
Comments: If you liked this post or have anything to say, please leave a comment! I love reading them
Just keep in mind that I’ll delete any rude, trolling, spammy, irrelevant or way off-topic comments. Also, use your REAL name, not a brand or business one, and don’t link to your site in the comments unless it’s relevant to this post.
If you have a general language learning question, please ask it in the forums. Otherwise please use the search tool on the right for any other question not related to this post.