How to convert PDFs/daily news/anything to ePUB/mobi for your eReader/Kindle

How to convert PDFs/daily news/anything to ePUB/mobi for your eReader/Kindle

Benny

(Note: The Language Hacking Guide has been converted to be read on all devices, using the techniques in this post.)

Thanks to ditching dead trees and embracing 21st century technology I have been reading a lot more lately and really enjoying myself.

I’ve got the fantastic Amazon Kindle (International free 3G version since I’m a traveller, but the $139 wifi version will be fine for most people), and you know what? After over two months, I still haven’t bought any books from the Amazon Kindle store!

Everything I have been reading has been either completely free content or an e-book that I have bought through other sites.

 

Getting around awkwardly formatted PDFs

Most PDFs you will come across online (for free) are actually just pure text and the Kindle opens these directly and as long as you read it in landscape mode, it renders it fantastically and you can simply click next page and read it very easily.

Unfortunately though, many bloggers who sell PDFs don’t produce them as simple text with some images; instead they format them to have double (or triple) columns on one page, put logos or other useless info in the sides, use a dark colour, bars across the top/bottom etc.

Even for reading on a computer screen I can’t say I like all of his to distract me, but no e-Reader can possibly automatically render this in an easy to read format. It does its best and zooms into a part of the page, but then you have to click up-down left-right to navigate instead of next page and this destroys the reading experience.

All of the graphics also makes them horrible to print out if that’s what you ultimately want to do.

Luckily I found a work-around that I have applied to blogger books I’ve been investing in, as well as to other e-books I have bought recently. In this post I’ll also share a few other sources of information to get free content to read on your eReader (iPad, iPod, Android, Kindle, Sony Reader, Nook, or whatever it may be!)

Step 1: Un-PDF-ifying your book

Adobe’s PDF format is best suited for creating non-editable documents for printing. As such it is not really ideally suited to share information online in complicated formatted documents.

What we need to do is take just the text (and images within the body for illustration) and use that to create our ePUB or Mobi.

You can try simply selecting the text, and copying and pasting it, but most PDF software doesn’t let you do this for the whole document.

I’ve tried a few ways to get something useful out of PDFs via conversion, including conversion using the professional version of Adobe Acrobat, and I’m still not entirely satisfied with the result for eReader conversion (there are line breaks in unideal places). It turns out the best way to get that text is to use OCR software on the PDF as this also incorporates a step for wrapping text properly.

I like ABBYY Finereader, and had to buy it to work with PDF documents as a translator, so I still have it on my Virtual Windows box. I have also used this to convert scanned and photographed text for reading and it has excellent recognition capabilities, in several languages.

I’m hoping that soon there will be good open source (free) alternative OCR software. Google Docs has a free PDF OCR conversion option, but didn’t work for all of the files I sent it. Otherwise, you will find lots of PDF to DOC (or otherwise) free converters online, whose quality varies. Calibre, mentioned below also works for PDF conversion.

The layout of the result may not be great depending on the source and the software, which is why I like to go through step 2 to make sure it is perfect for final conversion.

The goal is to ultimately open that text in Word Processing software, the best by far being (free) Open Office.

Step 2: Convert to HTML

This step may surprise you, but the format you ultimately want your book to be in before ePub/mobi conversion is basic html. Don’t worry, no coding is required.

Now, just go into Open Office’s (or other) Word processor and paste in the text of the entire book, and click “save as” and select HTML. That’s it! Once you have the text pasted in, you can tweak it to make sure it looks good (removing any page numbers / footers / inappropriate line breaks / bad formatting etc.)

When I was writing the Language Hacking Guide I did this directly from the text in the Word Processor, rather than conversion from a PDF of course. You can also paste in content from websites, or other sources to convert them into an ebook.

HTML allows you to include images, and lots of different types of formatting. If you just want a simple book that you don’t need to navigate, then go directly on to the next step.

If you want to make sure the book has a table of contents, then spend just a couple of minutes setting each of the chapter titles to be formatted as header 1 or 2. Select the title and set this in your Word processor (Top-left in OO’s Word processor). You don’t need to do anything other than that for this step.

Step 3: Use Calibre to create the ePub/mobi

The most crucial tool in this whole process is the free application, Calibre. Download and install it (free). It’s also very useful for reading the daily news for free as explained below.

ePub is based on HTML, and this is why pre-conversion to that format already is useful. It’s also a convenient format for Mobi and easy to manipulate in a Word Processor to edit out any problems before final conversion. Once you have your book in HTML format, the conversion process is very easy.

  1. Click “Add books” (top-left book symbol) and select the HTML file you have just created
  2. Select that book from the list and click “Convert books” (the third symbol in the top-left).
  3. In the first step of the conversion process you can put in the author, book name, and upload a picture to represent as the front page of the book if you wish.
  4. In the top-right, change the output file format to ePUB (most e-Readers) or MOBI (Amazon Kindle). Next click “Page setup” and select your reading device from the list so that it converts best for that screen size. Usually for ePUB the default is fine, and of course for MOBI select the Kindle.
  5. If you would like to add a “Table of Contents”, click this on the left and then click the magic wand beside “Level 1″ and select “h1″ from the drop down menu. You can leave the rest blank. If you have two levels (my Guide for example has sections as h1 and chapters as h2), set this for level 2.
  6. If converting to ePub, click “ePUB output” and select “Do not split on page breaks”.
  7. Click OK, and your ePub or Mobi file will be added to Calibre’s library directory! You can then add the file to your eReader by dragging it over directly, but Calibre detects most readers when they are connected to your computer and you can send it without needing to leave the software.

That’s it! Now you can read the book that has been converted especially for your device!

You can also go through exactly the same steps to convert an ePub to Mobi for the Kindle, by importing that ePub instead of the HTML file.

Reading international news in several languages, your favourite blogs and long articles on your eReader

Of course, I don’t just copy converted PDFs to my Kindle. I save webpages of interesting (but long) articles to read on it, and read the news daily in several languages, without paying anything.

To do this, still within Calibre, click the arrow beside “Fetch News” (the orange down arrow with “N” in it) and select “Add a custom news source”. You will need to add the RSS feed to “Feed URL” here.

I made this video to explain what RSS is for those unsure. Look for that symbol on various websites, or search for “RSS” or “feed” on the page to find it. I’ve added the feed of Le Monde, El País etc. as well as one or two blogs I like that tend to have long articles.

I also really like to use the Read It Later plugin when browsing and click a small icon in Firefox when I come to a long interesting article on my PC. It will be then downloaded for me to read on my Kindle offline by using the feed as shown here next time I connect to Calibre. (Note, Instapaper also have a plugin and you can right-click to save a page, and then on your Instapaper account download the mobi file directly).

—————-

So, there you have it! As you can imagine all of these PDFs I’ve bought, and all the interesting articles I mark to read later, as well as daily news means I have no shortage of things to read! Since a lot of that content is in foreign languages, it helps a lot for my language missions to have something to read for practice, while not worrying about having lights shining into my eyes as I would with a standard computer screen thanks to e-Ink technology.

One reason people tell me they prefer other readers over the cheaper and crisper Kindle is because of being “tied” to the Amazon store. As you can see, that’s not quite the case for me at all ;)

Let me know if you’ve tried something similar for your eReaders, or if you think you’ll get one yourself for Christmas! What else do you do to access import stuff to read? Let me know in the comments! (Note: For the first time in the history of the blog, I have had to disable comments on this post, because all new ones were just from spammy SEO companies wanting to promote their expensive PDF conversion tools. Sorry to those of you with real comments, but it was getting too much to deal with!)

(Note: The Language Hacking Guide has been converted to be read on all devices, using the techniques in this post.) Thanks to ditching dead trees and embracing 21st century technology I have been reading a lot more lately and really enjoying myself. I’ve got the fantastic Amazon Kindle (International free 3G version since I’m a […]

MORE


  • http://twitter.com/maximevaly maxime valy

    Hi Benny! FYI my company offers free OCR and PDF compression services for electronic documents when you open an account on http://www.xambox.com/Do tio estis honta sed tamen utila varbadeto :)Pri alia afero, mi aĉetis vian elektronikan libron por lerni la germanan. Ĝi bonegas sed miraklo ne okazis: mi daŭre bezonas multe labori por sufiĉe lertiĝi. Mi esperas ke la sekvonta lingvo (ĉar certe estos) pli facilos!Ĝis JES!Amike, Maxime

    • http://www.fluentin3months.com/ Benny the language hacker

      Thanks for sharing!
      Ĝis JES :D

  • http://www.neverendingvoyage.com Erin

    That is very useful Benny! We are definitely planning to order a Kindle when we get settled with an address. I only wish I had it now so I could use it to read the 23 guides I just bought as part of this fantastic offer.

    • http://www.fluentin3months.com/ Benny the language hacker

      That’s the big thing about having my Kindle – it helps me catch up on all those files I’ve been accumulating to “read some day”. Having them in a folder on the computer waiting just doesn’t cut it.

      I’ll get started on my 23 guides soon :)

  • Anonymous

    Is this by any chance a reply to my e-mail? If not, thanks anyway =D

    • http://www.fluentin3months.com/ Benny the language hacker

      It usually takes me a few days to reply to e-mails – just checked and yours was about using Calibre as the PDF conversion tool, which isn’t something I’d recommend if it is going so slowly for you. Doing it directly will also not allow you to improve the layout of the file or at a TOC.

      Better to go through the steps outlined here ;)

  • http://dgryski.blogspot.com Damian Gryski

    I find http://www.instapaper.com/ a great addition to an ebook reader. Like ReadItLater, it comes with a bookmarklet that lets you grab long blog posts or webpages to be “read later”. However, Instapaper has the advantage that you can download the days readings as an epub, and reformats articles to remove all the extra crap and leave just the text. Combine that with Dropbox, and I can do my reading at my leisure on my Android phone. (I won’t put my dropbox referral link here ’cause I think that would just be slimy. However, if you have multiple computers, or a computer and a smart phone, it makes sharing and syncing files between them really easy.)

    • http://www.fluentin3months.com/ Benny the language hacker

      Thanks! Didn’t know it had a direct conversion option. But in general I find instapapers saving option to be much slower than Read it Lter. I have a plugin thanks to RIL that I literally just have one click on a tick to the right of the address bar and that’s it. In Google Reader, it’s the same.

      Instapaper needs to be dragged onto a bookmarks bar, which I don’t use & In Google Reader needs an extra click on “Share”. It’s not extremely inconvenient, but it would be nicer if they made it less clicks away. Still, I’ll try it out and see if the conversion process is any easier ;)

  • http://twitter.com/cmsadler cmsadler

    Thanks Benny! This is so timely for me. The other day I was looking for a PDF to ePub converter to read books on my Android. After a few minutes of googling, I found some converters, but I hadn’t started trying them out. It’s good to know about the PDF -> HTML -> ePub step. I’m going to try your recommendations soon.

    I am also looking at the Kindle as well, since it seems pretty lightweight, but of course a much bigger screen than my Android phone.

    • http://www.fluentin3months.com/ Benny the language hacker

      Yes, I recommend using the free converters, but to a DOC and then editing that appropriately as HTML and then following the steps here ;)

      I read on the Android while waiting in buses etc. before getting the Kindle. The difference is immense. You’ll hurt your eyes looking at a small screen flashing light at you for longer than a few minutes! I love my Android, but unless it’s for quick reference I can’t recommend it for comfortable reading. Get a Kindle ;)

      • Daniel

        Danke für den Artikel.
        Hab mir auch einen Kindle zugelegt als Maßnahme gegen die ausufernde Amazon Bücherrechnung.
        Leider sind gute fremdsprachige Wörterbücher auf dem Kindle (noch) Mangelware, aber vielleicht ändert sich das ja noch…

  • Hook888888899999

    Helpful. But that ozan ARDUOUS process. Someone needs to dev some app that converts hundreds of large PDFs to HTML.

  • http://www.fluentin3months.com/ Benny Lewis

    Yes, but it’s so much work to correct some of its mistakes on scanned books that I’d only use it for books with real text within them if possible, and definitely not for certain languages.

  • http://www.thepanamericans.net Mark David Robertson

    Clearly HTML/ePub/Mobi are to be added to your list. This was utterly helpful. The future of reading belongs to easily accessed, easy-to-read electronic texts. 

    This was a great hacking guide. I’ve been looking for it for a while.

  • http://www.fluentin3months.com/ Benny Lewis

    No. ePub is based on HTML format, you have to set it up this way or the layout of your document may not work well – it also lets you add in chapter titles very easily.

  • http://www.fluentin3months.com/ Benny Lewis

    No. ePub is based on HTML format, you have to set it up this way or the layout of your document may not work well – it also lets you add in chapter titles very easily.

  • http://pulse.yahoo.com/_KEWXIS2FJBYESE72LWYC7VALRM Lian Zeng

    Benny, you have no idea what relief and joy your tips on PDF-Mobi conversion had brought me.  I am 60 years od and very much into reading ebooks in mobi format as I can hightlight bits that interest me. I tried several sites but was worried about safety issues, so I kept coming back to your instructions.  How amazing calibre is, never knew it is there free to use…so easy…and free newspaper downlaods..just unbelievable.  Thanks a million for sharing your knowledge online…you are so generous and kind..I would call you a Godsend.

    Lian

    • Anonymous

        iTunes Video to DVD Burner for Mac is created to burn all kinds of YouTube videos to DVD-5 or DVD-9 and with powerful Customize Menu including DVD Background, Menu Title (DVD Name), Button Style and Frame (Video decoration Frame). 
      http://www.doremisoft.net/dvd-maker-mac/

  • Matt

    I have  a copy of adobe writer from cs5.  it has an export to html feature located under File -> Export.  I am trying now to convert it to mobi.  it is taking a little longer than a straight pdf conversion.
    Matt

  • http://www.absolutejoynow.com Debbie Takara Shelor

    Thanks so much for sharing this. I was looking for information not so much for converting to read on my kindle, but to publish in kindle format. I’m writing a document in Word and will be uploading it to printed form on Create Space. I then want to turn it around and publish it for Kindle.  I’m going to try using your procedure.

    I think I can go straight from Word to html. I think I’ll try that first.

    Thanks again for all the insights about how to do this.

  • Abhay Kalyankar

    Pretty informative. For some reason calibre fails torecognise my new samsung galaxys2. Any reason you can think of?

    • http://www.fluentin3months.com/ Benny Lewis

      Not sure why that would be the case, (you can search online to see if others are having the same problem) but it doesn’t matter. Just save your end file to your desktop and then drag it into your smartphone, problem solved ;)

  • http://pulse.yahoo.com/_FVL6XTTRKQ2JE4SIIKXMLNELPY sandy

    Interesting,pdf convert can convert pdf to epub,html,text,jpg and other images. http://www.epubor.com/pdf-epub-tools.html
    ,also epub mobi converter can convert epub to mobi for kindle fire easily,enjoy!!
    http://www.epubor.com/epub-to-mobi.html

  • StartupTunes

    Great tutorial, I really like it !!

  • http://twitter.com/nadiadrm nadiazhou

    Hi,I’m Nadia.Thanks for your sharing!Make a good life with PDF to EPUB Converter for Mac:
    http://www.doremisoft.com/pdf/pdf-to-epub-converter-mac.htmlhttp://www.pdf-to-epub.com/for-mac.html

  • http://www.fluentin3months.com/ Benny Lewis

    Very useful, thanks!

  • http://androidflip.com Kuldeep Singh

    Bahut madad mili is guide se, subah se PDF ko >Mobi me convert karne me laga hua hu par koi fayda nhi hua .

    Ek bar convert ho bhi gaya tha , maine software use liya tha ebook to epub , lekin agli bar converter ne kam nhi kiya .

    Jo bhi ho finally ye article padhne ke bad meri pdf file .Mobi me convert ho hi gayi Now in reply in this language , PS. Language is HINDI

    • Shaktiman

      tum chutiye ho?

  • http://www.facebook.com/people/Emilio-Martí-López/100002777051300 Emilio Martí López

    Thanks a lot for your investigation and sharing! Keep on teaching freely ’round the world. Muchas gracias!

  • Anonymous

    I found a pdf-epub convert more powerful than the Calibre  http://www.pdf-epub-converter.com

  • Nataliya

    How did you get your 3G to work in Europe? On my Kindle the only website I can use with 3G is Amazon. I was so pissed when I couldn’t get it work in more than 5 countries, like Ukraine, the Netherlands, Germany, Denmark, Sweden or Belgium. Is there a trick?

  • http://www.facebook.com/people/Ovidiu-Oprea/100001748222885 Ovidiu Oprea

    Problem with line breaks.

    The mobi file uses the line breaks in the original html file to start new paragraphs. That means I get a new paragraph in mid-sentence every other line, which makes my mobi file unreadable.

    This is how it looks on my Kindle:

    “The mobi file uses the line breaks in the original html file

    to start
    new paragraphs. That means I get a new paragraph in

    mid-sentence every
    other line, which makes my mobi file unreadable. ”

    Can someone please help me fix this?

  • http://www.versedtech.org/ Sudip Majhi

    EBook Glue is also a very cool app that can be used for this purpose.

  • http://www.facebook.com/gopal.harlow Gopal Harlow

    I have a cookbook that is in pdf format with all the links between recipes and index/contents done. As there are 275 recipes, this took a lot of work. Is there anyway to preserve this formatting and get it into Mobi? Thanks for the guide it’s been of great help by the way

  • Christine Weald

    Thank you for clear, simple steps to make the reading of PDF’s easier in e-book format