HistoryVerbalize HN: Curate your have confidence search engine from...

Verbalize HN: Curate your have confidence search engine from pages you browse and bookmark

-

- Advertisment -

💾 – an web on yer Disk

DiskerNet (codename PROJECT 22120) is an archivist browser controller that caches every little thing you browse, a library server with full text search to support your archive.

Now with full text search over your archive.

This characteristic is suitable launched in version 2 so it’ll reinforce over time.

And one thing more…

- Advertisement -

Coming to a future launch, quickly!: The flexibility to publish your have confidence search engine that you just curated with the righteous resources primarily based mostly in your expert files and abilities.

Arrangement your have confidence binaries:

$ git clone https://github.com/i5ik/DiskerNet
$ cd DiskerNet
$ npm i 
$ ./scripts/build_setup.sh
$ ./scripts/bring collectively.sh
$ cd bin/


License

- Advertisement -

22120 is licensed below Polyform Strict License 1.0.0 (no modification, no distribution). You are going to be ready to select out a license to employ for deepest, learn, noncommercial purposes, below:

High

About

This project actually makes your web buying accessible COMPLETELY OFFLINE. Your browser doesn’t even know the variation. It be actually that fabulous. Sure.

Keep your buying, then switch off the obtain and lumber to http://localhost: 22120 and switch mode to support then browse what you browsed before. All of it nonetheless works.

- Advertisement -

warning: even as you occur to’ve gotten Chrome originate, it’ll shut it automatically even as you occur to originate 22120, and relaunch it. You might perhaps perhaps lose any unsaved work.

High

Get 22120

3 ways to obtain it:

  1. Get binary from the releases page., or
  2. Lumber with npx: npx archivist1@most modern, or
    • npm i -g archivist1@most modern && exlibris
  3. Clone this repo and bustle as a Node.JS app: npm i && npm originate

High

Utilizing

Engage attach mode or support mode

Trail to http://localhost: 22120 in your browser,
and note the instructions.

High

Exploring your 22120 archive

Archive shall be located in 22120-arc/public/library*

But it be no longer public, construct no longer alarm!

You are going to be ready to also take a look at out the archive index, for a checklist of every title within the archive. The index is accessible from the alter page, which by default is at http://localhost: 22120 (until you changed the port).

*Point to: 22120-arc is the archive root of a single archive, and by defualt it’s positioned in your private residence directory. But that you just can alternate the parent directory for 22120-arc to possess just a few archvies.

High

Layout

The archive layout is:

22120-arc/public/library//.json

Contained within the JSON file, is a JSON object with headers, response code, key and a obnoxious 64 encoded response physique.

High

Why no longer WARC (or any other layout love MHTML) ?

The case for the 22120 layout.

Other codecs (love MHTML and SingleFile) attach translations of the resources you archive. They produce adjustments, equivalent to altering the internal construction of the HTML, altering hyperlinks and URLs into “flat” embedded files URIs, or local references, and require other “hacksin repeat to connect a “perceptually same” copy of the archived resource.

22120 throws all that out, and calls rubbish on it. 22120 saves a verbatim high-fidelity copy of the resources your archive. It doesn’t alter their internal construction whatsoever. As a replacement it records every resource in its have confidence metadata file. In that draw it’s more equivalent to HAR and WARC, but nonetheless radically totally different. When put next to WARC and HAR, our layout is radically simplified, throwing out many of the metadata files and unnecessary fields these codecs obtain.

Why?

At 22120, we judge within the resources and in verbatim copies. We construct no longer annoint ourselves as all colorful adequate to change the resource source of truth before we archive it, appropriate so it’ll “fit the layoutwe have. We construct no longer judge we must adorn with obtuse and superfluous metadata. We construct no longer judge we needs to be modifying or altering resources we archive. We belive we have to nonetheless attach them precisely as they were presented. We judge in simplicity. We judge the layout have to nonetheless fit (or as a minimal accommodate, and be well-behaved to) the resource, no longer the opposite draw round. We construct no longer judge in conflating metadata with utter material; so we separate them. We judge atmosphere apart metadata and utter material, and preserving the utter material pure and altered for the length of the archiving process is no longer easiest the righteous thing to form, it simplifies every allotment of the audit path, attributable to we know that the adjustments between archived copies of a resource of because of the adjustments to the resources themselves, no longer artefacts of the layout or archiving process.

Every SingleFile and MHTML require mutilatious adjustments of the resources so that the resources would perchance perhaps perhaps very effectively be “compelled to suit” the layout. At 22120, we judge right here’s no longer required (and after all have to nonetheless never be performed). We stare it as same to lopping off the fingers of a Roman statue in repeat to suit it accurate into a presentation and security expose box. How ridiculous! The obtain would perchance perhaps perhaps very effectively be a more “pliable” medium but that doesn’t point out we have to nonetheless take care of it without respect for its inherent utter material.

Why is altering the internal construction of resources so unhealthy?

In our look for, the internal construction of the resource as presented, is the cannon. Inner construction is no longer appropriate substitutable “presentation” – no, truly it encodes a must-possess semantic files equivalent to hyperlink relationships, source decisions, and the “strokes” of the resource author as they produce their utter material, even when it be mediated via a web server or web framework.

Why else is 22120 the evident and pure preference?

22120 also archives resources precisely as they are sent to the browser. It runs connected to a browser, and so is ready to access the whole-scope of resources (with, in the present day, the exception of video, audio and websockets, for now) of their absolute best fidelity, without modification, that the browser receives and is ready to archive them within the explicit layout presented to the user. Many resources endure presentational and processing adjustments before they are presented to the user. Here’s the ubiquitous, “web app”, the build client-side scripting enabled by JavaScript, creates resources and resource views on the cruise. These sorts of “hyper resources” or “realtime” or “client side” resources, prevalent in SPAs, are no longer ready to be archived, as a minimal no longer utilizing the fashioned archive drift, inner veteran wget-primarily based mostly archiving instruments.

Briefly, the obtain is an online medium, and it needs to be archived and presented within the same model. 22120 archives utter material precisely because it’s received and presented by a browser, and it also replays that utter material precisely as if the resource were being taken from online. Sure, it requires a browser for this convey, but that browser need no longer be connected to the obtain. It’s easiest pure that viewing a web resource requires the obtain browser. And because of the 22120 the browser doesn’t know the variation! Property presented to the browser form a much away web region, and resources given to the browser by 22120, are viewed by the browser as precisely the same. This ensures that the opposite folks viewing the archive are also no longer let down and are given the alternate to possess the explicit same abilities as if they were viewing the resource online.

High

The draw it truly works

Uses DevTools protocol to intercept all requests, and caches responses against a key manufactured from (METHOD and URL) onto disk. It also maintains an in memory dilemma of keys so it’s far aware of what it has on disk.

High

FAQ

Construct I must download something?

Sure. But….Whenever you occur to love 22120, you may perchance perchance perhaps presumably fancy the clientless hosted version coming in future. You are going to be ready to produce your archives online from any application, with none download, then download the archive to bustle on any desktop. You are going to have to study in to employ it, but that you just can jump the queue and take a look at in this day.

Can I employ this with a browser that is no longer Chrome-primarily based mostly?

No.

High

How does this work alongside with Ad blockers?

Interacts appropriate horny. The issues advert blockers cease will no longer be archived.

High

How valid is working chrome with far away debugging port originate?

Appears tender valid. It be no longer uncovered to the public web, and pages you load that tried to employ it’ll’t employ the protocol for anything (except to originate a sleek tab, which they’ll form anyway). It appears to be like there is a doable threat from malicious browser extensions, but we’d have to confirm that and if that is so, determine blocks. Glimpse this functional security connected put up for some files.

High

Is that this free?

Sure right here’s completely free to download and employ for deepest non-commercial employ. Whenever you occur to make a choice to must alter or distribute it, or employ it commercially (either internally or for buyer functions) you’ve gotten to select out a Noncommercial, internal employ, or SMB license.

High

What if it’ll’t obtain my chrome?

Glimpse this functional project.

High

What’s the roadmap?

  • Stout text search
  • Library server to support archive publicly.
  • Disbursed p2p web browser on IPFS

High

What about streaming utter material?

The following are potentially onerous (and I haven’t thought much about):

  • Streaming utter material (audio, video)
  • “Impure” place apart a question to response pairs (equivalent to even as you occur to name GET /endpoint 1 time you obtain “A”, even as you occur to name it a second time you obtain “AA”, and other examples love this).
  • WebSockets (how one can take and replay that faithfully?)

Most seemingly some system to form this tho.

High

Can I dim checklist domains to no longer archive them?

Sure! Put any domains into 22120-arc/no.json*, eg:

[
  "*.horribleplantations.com",
  "*.cactusfernfurniture.com",
  "*.gustymeadows.com",
  "*.nytimes.com",
  "*.cnn.co?"
]

Will no longer cache any resource with a host matching these. Wildcards:

  • * (0 or more anything) and
  • ? (0 or 1 anything)

*Point to: the no file is per-archive. 22120-arc is the archive root of a single archive, and by defualt it’s positioned in your private residence directory. But that you just can alternate the parent directory for 22120-arc to possess just a few archvies, and each archive requires its have confidence no file, even as you occur to make a choice to possess a blacklist in that archive.

High

Is there a DEBUG mode for troubleshooting?

Sure, appropriate be sure you dilemma an environment variable known as DEBUG_22120 to anything non empty.

So as an instance in posix programs:

High

Can I version the archive?

Sure! But you’ve gotten to employ git for versioning. True provoke a git repo in your archive repository. And even as you occur to make a choice to must attach a snapshot, obtain a sleek git commit.

High

Can I alternate the archive direction?

Sure, there is a alter for altering the archive direction within the alter page: http://localhost: 22120

High

Can I alternate this other thing?

There’s just a few uncover line arguments. You are going to stare the layout printed as the major printed line even as you occur to originate this map.

For other issues that you just can understand the source code.

High

Join the pack! Join 8000+ others registered users, and obtain chat, obtain teams, put up updates and obtain pals within the course of the enviornment!
www.knowasiak.com/register/

- Advertisement -
Previous articleIceraven – Firefox for Android fork with extra add-ons and configuration alternatives
Next articleThe Antikythera mechanism reveals fresh secrets
Charlie avatar
Charliehttps://plus.google.com/105215503769457384118
Fill your life with experiences so you always have a great story to tell

You might also likeRELATED
Recommended to you

Kolmogorov Complicity and the Parable of Lightning

A good scientist, in other words, does not merely ignore conventional wisdom, but makes a special effort to break it. Scientists go looking for trouble. — Paul Graham, What You Can’t Say I. Staying on the subject of Dark Age myths: what about all those scientists burned at the stake for their discoveries? Historical consensus…

Why ‘staking’ is becoming an important part of crypto investing

Toronto police investigating alleged mortgage fraud worth $5M U.S. Consumer Spending Buffeted by Fastest Inflation in Decades Billionaire Len Blavatnik Takes Control of Troubled NYC Condo Project NYC-Area Beach Town Reaches Tentative Deal on $140 Million Judgment U.S. New-Home Sales Jump in November to a Seven-Month High Luxury Real Estate Trends From New York’s Best Sales Year Ever…

ADHD Accommodations Guide

Employer If you’re an employer looking for ways to help your employee with ADHD perform better, you may want to...
- Advertisement -

My self-hosting infrastructure, fully automated

This project utilizes Infrastructure as Code to automate provisioning, operating, and updating self-hosted services in my homelab. It can be used as a highly customizable framework to build your own homelab. Feel free to join me on my Matrix chat server at chat.khuedoan.com, or #homelab:matrix.khuedoan.com if you already have a Matrix client. Please note that…

SUSE announces Liberty Linux, new distro for those who miss the old CentOS

Official details remain scant, but SUSE Liberty Linux is a new member of the growing tribe of CentOS Linux replacements. The offering seems to be a SUSE rebuild of CentOS 8, aimed at near-perfect RHEL 8 compatibility. Since Red Hat killed off CentOS Linux and replaced it with CentOS Stream, there's been renewed activity in…

Must read

Cannabidiol inhibits SARS-CoV-2 replication through induction of host ER stress

Long Chi Nguyen https://orcid.org/0000-0002-4637-2331, Dongbo Yang https://orcid.org/0000-0002-8343-5477, Vlad Nicolaescu...

Show HN: Status Line – Zork (and more) on the Pico-8

A z-machine interpreterfor the Pico-8 >play Zork on the Pico-8You can't do that. >say the magic word xyzzyOK, now you can play Zork on the Pico-8. >play ZorkWest of HouseYou are standing in an open field west of a white house, with a boarded front door.There is a small mailbox here. Welcome to Status Line…
- Advertisement -