I now not too long previously purchased a book for my MSC which changed into only on hand thru a crappy Android app. There changed into no evident technique to decrypt it to read on a extra shining application, so I resorted to the ragged art work of screenscraping.
Here’s a quick-and-dirty technique to take hang of images of the pages and convert them to a broken-down PDF using Linux. There would possibly maybe be a lot extra you will doubtless be in an enviornment to manufacture to compose the end book extra precious, but this’ll uncover you started
Featured Content Ads
add advertising hereA total bunch Cover Images
With a USB cable plugged into my phone and pc pc, I wrote a putrid dinky bash script:
#!/bin/bash
for i in {00001..00555}; fabricate
adb exec-out screencap -p > $i.png
adb shell enter tap 1000 2000
sleep 1s
done
echo All done
This runs a loop 555 times. Takes a screenshot, names it for the loop number with padded zeros, faucets the bottom salubrious of the screen, then waits for a 2nd to compose tremendous the page has refreshed. Late and insensible, but works reliably.
Images vary from 200KB to 2MB searching on complexity. Attend them up before doing the next bit.
Cropping
The screenshots are all 1080×2160. However the page only takes up fragment of that. The tip left corner is at 50×432 and the bottom salubrious is at 1028×1726.
Featured Content Ads
add advertising hereThis list plants your total images. It is harmful, so compose tremendous you bear a backup.
mogrify -cleave 978x1294+50+432 +repage *.png
It be also precious to pleasing the pictures to take hang of any whitespace from the borders. That makes a smaller file size.
mogrify -pleasing *.png
Images can also be decreased in size with:
pngquant *.png
PDF and OCR
Sticking your total images together correct into a single PDF is moderately easy:
Featured Content Ads
add advertising hereconvert *.png +repage output.pdf
The +repage
probability keeps the side ratio of the trimmed image.
But there is no text to search spherical. There are a bunch of OCR programs on Linux, I fancy PDF Sandwich
:
pdfsandwich -rgb -nopreproc output.pdf
That’ll uncover you a coloration PDF with OCR’d text embedded in it. The text is “sandwiched” on the merit of the image of the page, so that you would possibly well presumably now not look it but can look for it.
You would also additionally use OCRmyPDF that would also find yourself in a smaller file:
ocrmypdf -l eng output.pdf output_ocr.pdf
And that’s it. I now bear a searchable PDF which I can read on any application.
What bear we learned?
DRM on textbooks is an annoyance. For pc science books, it is dinky larger than a fig-leaf.