I correct released shot-scraper-template, a GitHub repository template that helps you initiate up taking computerized screenshots of a web insist online by filling out a gather.
shot-scraper is my suppose line utility for taking screenshots of internet sites and scraping info from them the utilization of JavaScript.
Featured Content Ads
add advertising hereOne in every of its makes exercise of is to wait on bear and withhold screenshots for documentation, making it easy to replace them to embody changes to the bear of the underlying pages.
To assemble this as easy as that you just might recount, I’ve created a GitHub repository template that automates the device of atmosphere up shot-scraper
to roam against a URL.
To are trying it out, originate up here:
https://github.com/simonw/shot-scraper-template/generate
Featured Content Ads
add advertising here
Purchase a name for your original repository and paste the URL of the page you also can very effectively be looking out to have to screenshot into the description self-discipline.
Then click on “Develop repository from template”.
That’s it! Your original repository will be created, a GitHub Actions automation script will roam for a few seconds and your original screenshot will be added to the repository as a file called shot.png
.
Featured Content Ads
add advertising hereRight here’s an instance repository I created the utilization of the template: simonw/simonwillison-fetch-shot—and here’s the shot.png
file from that repo:
You might doubtless re-pick the screenshot any time you also can very effectively be looking out to have by clicking the “Flee workflow” button in the Actions tab:
Your repository will have a file in it called pictures.yml
that in the first field appears to be like luxuriate in this:
- url: https://simonwillison.fetch/ output: shot.png height: 800
You might doubtless edit that file to alternate the settings that notice to your screenshot, or to add extra URLs to make a choice pictures of luxuriate in this:
- url: https://simonwillison.fetch/ output: shot.png height: 800 - url: https://www.instance.com/ output: instance.png height: 800
Extra alternate strategies are readily accessible here, as described in the shot-scraper README.
How this works
This complete machine is primarily based entirely round a single GitHub Actions workflow, in .github/workflows/pictures.yml.
Right here’s an annotated reproduction of that workflow exhibiting the scheme in which it all works.
name: Take grasp of screnshots on: push: workflow_dispatch:
The workflow triggers when a alternate is made to the repository (including edits to the pictures.yml
file) or when the patron manually clicks “Flee workflow”.
jobs: shot-scraper: runs-on: ubuntu-most well liked if: ${{ github.repository !='simonw/shot-scraper-template' }}
Right here’s the trick that makes the entirety else work, which I picked up from Bruno Rocha final one year. It ensures that this workflow job finest runs on copies of the template, no longer on the initial template repository itself.
Right here’s the biggest on narrative of a later step creates a file in the repository if it doesn’t yet exist in line with the description URL supplied by the patron.
steps: - makes exercise of: actions/checkout@v2 - name: Region up Python 3.10 makes exercise of: actions/setup-python@v2 with: python-model: "3.10" - makes exercise of: actions/cache@v2 name: Configure pip caching with: direction: ~/.cache/pip key: ${{ runner.os }}-pip-${{ hashFiles('necessities.txt') }} restore-keys: | ${{ runner.os }}-pip-
Right here’s boilerplate that I exercise in most of my GitHub Actions workflows: it devices up Python 3.10, and likewise configures a cache such that Python necessities in a necessities.txt
file persist from one invocation to 1 other while not having to be re-downloaded from PyPI.
- name: Cache Playwright browsers makes exercise of: actions/cache@v2 with: direction: ~/.cache/ms-playwright/ key: ${{ runner.os }}-browsers
shot-scraper
makes exercise of Microsoft’s initiate provide Playwright browser automation utility. Playwright works by putting in its non-public elephantine Chromium browser. This line configures a cache for that browser, such that future invocations of the Action don’t have to procure one other reproduction.
- name: Set up dependencies roam: | pip set up -r necessities.txt - name: Set up Playwright dependencies roam: | shot-scraper set up
The pip set up
line here installs the shot-scraper
CLI utility, which is written in Python.
That shot-scraper set up
line then triggers the Playwright mechanism to procure and set up the browser. This might kind nothing if the browser has already been cached.
- makes exercise of: actions/github-script@v6 name: Develop pictures.yml if lacking on first roam with: script: | const fs=require('fs'); if (!fs.existsSync('pictures.yml')) { const desc=context.payload.repository.description; let line=''; if (desc && (desc.startsWith('http://') || desc.startsWith('https://'))) { line=`- url: ${desc}` + 'n output: shot.pngn height: 800'; } else { line='# - url: https://www.instance.com/n# output: shot.pngn# height: 800'; } fs.writeFileSync('pictures.yml', line + 'n'); }
Right here’s the various key half of magic. This makes exercise of GitHub’s github-script action, which supplies a Node.js ambiance with a context
object containing crucial aspects in regards to the actions roam.
It begins by studying the repository description from context.payload.repository.description
.
Then it creates a pictures.yml
file in line with that description—but finest if the file doesn’t exist already.
If there’s no repository description it creates one with a commented-out configuration as every other, that appears to be like luxuriate in this:
# - url: https://www.instance.com/ # output: shot.png # height: 800
The final step is to make a choice the screenshots:
- name: Take grasp of pictures roam: | shot-scraper multi pictures.yml
shot-scraper multi
is documented here—it runs by the YAML file and takes each of the screenshots configured there in flip.
Final step is to commit and push the original pictures.yml
and shot.png
files to the repository:
- name: Commit and push roam: |- git config shopper.name "Automatic" git config shopper.email "actions@users.noreply.github.com" git add -A timestamp=$(date -u) git commit -m "${timestamp}" || exit 0 git pull --rebase git push
This makes exercise of a sample I disclose in this TIL.
GitHub Actions as a platform
I tweeted this the various day, at the moment before I came up with the postulate for the shot-scraper-template
repository.
If truth be told recount GitHub Actions would perchance per chance also very effectively be my favourite serverless platform just now
– Simon Willison (@simonw) March 13, 2022
This venture demonstrates why. The amount of advanced transferring formula excited by shot-scraper-template
in all equity bewildering, however the tip result’s a free utility that anyone can exercise to originate up taking computerized screenshots.
And it doesn’t tag me the leisure to manufacture it to them both!