I’ve gotten myself a Supernote A5X (awesome device btw) and since it doesn’t have a web browser or anything I’ve wanted to have a way to read news on it. I’ve hacked together this utility in a couple of days and it works wonders for me personally so I thought it might be interesting to others. It can also be used as a noise free newspaper generator as it removes images/ads/links and other noisy stuff.
Featured Content Adsadd advertising here
(there is a screenshot of the first page of the generated pdf)
It scrapes (news) websites for content and puts it into a pdf. For me the pdf location is my dropbox supernote directory so my setup is to run this thing daily and have a fresh pdf with news whenever I want it.
It’s rough around the edges probably (currently added crawl support for verge, ars, engadget) but I think it’s a good base so if anyone wants to contribute feel free. Some of the stuff I want to add is pictures (maybe), maybe parse the text html to include font styling and other stuff.
I’ve tried to generalize it as much as possible so the crawling is pretty much automatic and is controlled by a config file where you define “rules” on how to parse the website.