Demonstrate HN: Straight bear a GitHub repository to make a choice screenshots of a web insist online

87
Demonstrate HN: Straight bear a GitHub repository to make a choice screenshots of a web insist online

I correct released shot-scraper-template, a GitHub repository template that helps you initiate up taking computerized screenshots of a web insist online by filling out a gather.

shot-scraper is my suppose line utility for taking screenshots of internet sites and scraping info from them the utilization of JavaScript.

One in every of its makes exercise of is to wait on bear and withhold screenshots for documentation, making it easy to replace them to embody changes to the bear of the underlying pages.

To assemble this as easy as that you just might recount, I’ve created a GitHub repository template that automates the device of atmosphere up shot-scraper to roam against a URL.

To are trying it out, originate up here:

https://github.com/simonw/shot-scraper-template/generate

Screenshot of the 'create new repository from shot-scraper-template' page, which asks for a repository name and a description. The URL for the page you want to take screenshots of goes in the description.

Purchase a name for your original repository and paste the URL of the page you also can very effectively be looking out to have to screenshot into the description self-discipline.

Then click on “Develop repository from template”.

That’s it! Your original repository will be created, a GitHub Actions automation script will roam for a few seconds and your original screenshot will be added to the repository as a file called shot.png.

Right here’s an instance repository I created the utilization of the template: simonw/simonwillison-fetch-shot—and here’s the shot.png file from that repo:

A screenshot of simonwillison.net

You might doubtless re-pick the screenshot any time you also can very effectively be looking out to have by clicking the “Flee workflow” button in the Actions tab:

Click Actions, Take screenshots, Run workflow and then Run workflow

Your repository will have a file in it called pictures.yml that in the first field appears to be like luxuriate in this:

- url: https://simonwillison.fetch/
  output: shot.png
  height: 800

You might doubtless edit that file to alternate the settings that notice to your screenshot, or to add extra URLs to make a choice pictures of luxuriate in this:

- url: https://simonwillison.fetch/
  output: shot.png
  height: 800
- url: https://www.instance.com/
  output: instance.png
  height: 800

Extra alternate strategies are readily accessible here, as described in the shot-scraper README.

How this works

This complete machine is primarily based entirely round a single GitHub Actions workflow, in .github/workflows/pictures.yml.

Right here’s an annotated reproduction of that workflow exhibiting the scheme in which it all works.

name: Take grasp of screnshots

on:
  push:
  workflow_dispatch: 

The workflow triggers when a alternate is made to the repository (including edits to the pictures.yml file) or when the patron manually clicks “Flee workflow”.

jobs:
  shot-scraper:
    runs-on: ubuntu-most well liked
    if: ${{ github.repository !='simonw/shot-scraper-template' }}

Right here’s the trick that makes the entirety else work, which I picked up from Bruno Rocha final one year. It ensures that this workflow job finest runs on copies of the template, no longer on the initial template repository itself.

Right here’s the biggest on narrative of a later step creates a file in the repository if it doesn’t yet exist in line with the description URL supplied by the patron.

    steps:
    - makes exercise of: actions/checkout@v2
    - name: Region up Python 3.10
      makes exercise of: actions/setup-python@v2
      with:
        python-model: "3.10"
    - makes exercise of: actions/cache@v2
      name: Configure pip caching
      with:
        direction: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('necessities.txt') }}
        restore-keys: |
          ${{ runner.os }}-pip-

Right here’s boilerplate that I exercise in most of my GitHub Actions workflows: it devices up Python 3.10, and likewise configures a cache such that Python necessities in a necessities.txt file persist from one invocation to 1 other while not having to be re-downloaded from PyPI.

    - name: Cache Playwright browsers
      makes exercise of: actions/cache@v2
      with:
        direction: ~/.cache/ms-playwright/
        key: ${{ runner.os }}-browsers

shot-scraper makes exercise of Microsoft’s initiate provide Playwright browser automation utility. Playwright works by putting in its non-public elephantine Chromium browser. This line configures a cache for that browser, such that future invocations of the Action don’t have to procure one other reproduction.

    - name: Set up dependencies
      roam: |
        pip set up -r necessities.txt
    - name: Set up Playwright dependencies
      roam: |
        shot-scraper set up

The pip set up line here installs the shot-scraper CLI utility, which is written in Python.

That shot-scraper set up line then triggers the Playwright mechanism to procure and set up the browser. This might kind nothing if the browser has already been cached.

    - makes exercise of: actions/github-script@v6
      name: Develop pictures.yml if lacking on first roam
      with:
        script: |
          const fs=require('fs');
          if (!fs.existsSync('pictures.yml')) {
              const desc=context.payload.repository.description;
              let line='';
              if (desc && (desc.startsWith('http://') || desc.startsWith('https://'))) {
                  line=`- url: ${desc}` + 'n  output: shot.pngn  height: 800';
              } else {
                  line='# - url: https://www.instance.com/n#   output: shot.pngn#   height: 800';
              }
              fs.writeFileSync('pictures.yml', line + 'n');
          }

Right here’s the various key half of magic. This makes exercise of GitHub’s github-script action, which supplies a Node.js ambiance with a context object containing crucial aspects in regards to the actions roam.

It begins by studying the repository description from context.payload.repository.description.

Then it creates a pictures.yml file in line with that description—but finest if the file doesn’t exist already.

If there’s no repository description it creates one with a commented-out configuration as every other, that appears to be like luxuriate in this:

# - url: https://www.instance.com/
#   output: shot.png
#   height: 800

The final step is to make a choice the screenshots:

    - name: Take grasp of pictures
      roam: |
        shot-scraper multi pictures.yml

shot-scraper multi is documented here—it runs by the YAML file and takes each of the screenshots configured there in flip.

Final step is to commit and push the original pictures.yml and shot.png files to the repository:

    - name: Commit and push
      roam: |-
        git config shopper.name "Automatic"
        git config shopper.email "actions@users.noreply.github.com"
        git add -A
        timestamp=$(date -u)
        git commit -m "${timestamp}" || exit 0
        git pull --rebase
        git push

This makes exercise of a sample I disclose in this TIL.

GitHub Actions as a platform

I tweeted this the various day, at the moment before I came up with the postulate for the shot-scraper-template repository.

If truth be told recount GitHub Actions would perchance per chance also very effectively be my favourite serverless platform just now

– Simon Willison (@simonw) March 13, 2022

This venture demonstrates why. The amount of advanced transferring formula excited by shot-scraper-template in all equity bewildering, however the tip result’s a free utility that anyone can exercise to originate up taking computerized screenshots.

And it doesn’t tag me the leisure to manufacture it to them both!

Read More

Charlie Layers
WRITTEN BY

Charlie Layers

Fill your life with experiences so you always have a great story to tell

you're currently offline