Quick and dirty way to rip an eBook from Android

Quick and dirty way to rip an eBook from Android

I now not too long previously purchased a book for my MSC which changed into only on hand thru a crappy Android app. There changed into no evident technique to decrypt it to read on a extra shining application, so I resorted to the ragged art work of screenscraping.

Here’s a quick-and-dirty technique to take hang of images of the pages and convert them to a broken-down PDF using Linux. There would possibly maybe be a lot extra you will doubtless be in an enviornment to manufacture to compose the end book extra precious, but this’ll uncover you started

A total bunch Cover Images

With a USB cable plugged into my phone and pc pc, I wrote a putrid dinky bash script:

for i in {00001..00555}; fabricate
   adb exec-out screencap -p > $i.png
   adb shell enter tap 1000 2000
   sleep 1s
echo All done

This runs a loop 555 times. Takes a screenshot, names it for the loop number with padded zeros, faucets the bottom salubrious of the screen, then waits for a 2nd to compose tremendous the page has refreshed. Late and insensible, but works reliably.

Images vary from 200KB to 2MB searching on complexity. Attend them up before doing the next bit.


The screenshots are all 1080×2160. However the page only takes up fragment of that. The tip left corner is at 50×432 and the bottom salubrious is at 1028×1726.

This list plants your total images. It is harmful, so compose tremendous you bear a backup.

mogrify -cleave 978x1294+50+432 +repage *.png

It be also precious to pleasing the pictures to take hang of any whitespace from the borders. That makes a smaller file size.

mogrify -pleasing *.png

Images can also be decreased in size with:

pngquant *.png


Sticking your total images together correct into a single PDF is moderately easy:

convert *.png +repage output.pdf

The +repage probability keeps the side ratio of the trimmed image.

But there is no text to search spherical. There are a bunch of OCR programs on Linux, I fancy PDF Sandwich:

pdfsandwich -rgb -nopreproc output.pdf

That’ll uncover you a coloration PDF with OCR’d text embedded in it. The text is “sandwiched” on the merit of the image of the page, so that you would possibly well presumably now not look it but can look for it.

You would also additionally use OCRmyPDF that would also find yourself in a smaller file:

ocrmypdf -l eng output.pdf output_ocr.pdf

And that’s it. I now bear a searchable PDF which I can read on any application.

What bear we learned?

DRM on textbooks is an annoyance. For pc science books, it is dinky larger than a fig-leaf.

Join the pack! Join 8000+ others registered customers, and uncover chat, compose teams, post updates and compose chums all over the enviornment!

Charlie Layers

Charlie Layers

Fill your life with experiences so you always have a great story to tell