Featured Content Ads
add advertising hereWikipedia, the world’s online encyclopedia, is a useful, volunteer-curated source of information. Often it’s said that the most valuable parts of a given Wikipedia article are not the user-contributed topic explanations, but the references from which those explanations are based.
Wikiref aims to make the process of extracting these references for later review or analysis dead simple. It operates as a Firefox (and soon to be Chrome!) browser extension that is only active when you’re on a Wikipedia page.
Currently, Wikiref is not in the Firefox Add-ons store, so you can only add it as a temporary add-on. I plan to apply to get this listed the extension store, but in the meantime, here are instructions for installing it as a temporary add-on in Firefox:
- Clone the repo:
git clone https://github.com/zaataylor/wikiref.git
- Navigate to
about:debugging
- Select “This Firefox”
- Click “Load Temporary Add-on”
- Find the location of the cloned Wikiref repository from the dropdown and click on any file in the extension directory.
- Navigate to a Wikipedia page and have fun! 🙂
We’ll illustrate how to use Wikiref by extracting some references from this Wikipedia page about dynamic arrays:
Featured Content Ads
add advertising hereSelecting References
We can capture specific references on the page by first clicking the Wikiref popup in the browser toolbar, then entering Select Mode by clicking “Select References”.
Next, we can scroll down to the “References” or “External Links” sections and click on the UI element of interest. The item will be highlighted and encircled by a solid black border to make it clear what is currently selected.
Since the item we’ll click on is likely a list item of some sort, we can optionally expand our selection to encompass all of the list items contained in the same list as the item that was originally clicked on. This makes extracting an entire section of references from a specific part of the page very easy.
Featured Content Ads
add advertising hereAfter selecting a reference or section of references, we can extract the text and external links associated with these references by clicking the green check box that appears under the selected items. If we want to change our current selection instead of capturing the currently selected element(s), we can simply click on the new element we want to select. Alternatively, if we want to deselect the currently selected references, we can simply press the red ✕ that appears under the selected element(s). The screen capture below illustrates all of these features in the order: extract references, change selection, and cancel selection.
Displaying and Editing Selected References
If we’ve captured a set of references and want to see what they look like in tabular form, along with any external links they contain, we can enter Display Mode. This will insert a Since the reference titles are pulled directly from the HTML of the page, we may notice that some titles include Wikipedia page navigation indicators such as “^”, or increasing character sequences like “a b c” that indicate multiple citations of a particular reference. This can be annoying if we’re just trying to capture the actual reference’s text, which is why Display Mode also enables us to edit the text of references.
To edit references in Display Mode, click the pencil icon in the top right of the After we’ve made all of our edits, we can exit Edit Mode by clicking the pencil icon again, which should revert the icon’s appearance to its original form. From here, we can either download the edited references as JSON by clicking the export icon to the left of the pencil icon, or exit Display Mode entirely by clicking the ✕ icon to the right of the pencil icon.
If we decide we want to start over and delete the references we’ve previously captured, we can select “Delete References” in the popup UI, which will remove the references from If we’re satisfied with the references we’ve currently captured/edited and want to download these references (text and any external links) as JSON, we can do so simply by clicking “Download References” in the popup UI.
This will create a JSON file that is named based on the lowercased version of the last portion (splitting based on Wikiref is comprised of three components, following the pattern used by extensions: background, popup, and content.
A reference is represented by a relatively simple data structure. It is a JavaScript object that consists of The algorithm for capturing references is relatively straightforward. Here’s the sequence of steps it follows:
User enters Select Mode by clicking “Select References” in the extension popup UI.
User clicks on a particular reference item.
When a user clicks on the green ✓ icon — possibly after having expanded selection to all items in a list by clicking >
tab
character to finalize the edit. I am not a skilled web developer, and the Edit Mode view may not be the most visually appealing or well-designed user experience. I consider it a work in progress, and I welcome user feedback to help make it better!
Deleting References
localStorage
.
Downloading References
/
and ignoring document sections indicated by #
) of the document.baseURI
of the current page. For instance, if the current page (and section) we’ve navigated to and captured references on is https://en.wikipedia.org/wiki/Hard_disk_drive#References
, downloading the references will generate hard_disk_drive.json
.
Architecture
Background
background.js
: This script contains the logic for actually executing a download of references after receiving a message from the content script. It primarily consists of a handleMessage
event listener that currently just listens for messages related to downloads, but could easily be extended later on to encompass other kinds of events.
Popup
popup.js
: Logic in this script listens for clicks on the extension popup and sends specific messages to the content script running in the active tab of the current window, triggering different extension modes such as Select Mode and Display Mode.
popup.css
: Styling for the popup.
wikiref.html
: Skeleton of the popup.
Content
wikiref.js
: The “brain” behind Wikiref. Contains all of the logic for selecting, extracting, displaying, initiating download of, and editing references.
References
Anatomy of a Reference
text
, links
, hash
, and id
properties. text
is a string containing the text of the reference as it appears on the topic page. links
is an array containing the href
values of each external link included in the specific reference text; currently non-external links aren’t captured. hash
is a SHA-1 hash of the normalized document.baseURI
concatenated with text
using the |
character as a separator. id
is an incrementing integer value that indicates the order in which a reference was extracted.
Capturing References
click
event listener to document.body
and changes the style of the cursor to pointer
so it’s more intuitive to the user that they can now click on references.
event.target
, then traverses up the element.parentElement
lineage for the clicked element until it finds an
. This parent element is marked as the actual element from which reference information will be extracted. This makes it easy to consistently apply styles to a selected reference regardless of what part of the reference is clicked, since clicks are really “bubbled up” to the parent
containing the clicked element.
is contained in, and ✕ for cancelling selection — that’s inserted directly under the highlighted
.
Expand Selection
in the
extractReference(child, index)
, which has a relatively straightforward implementation:
/**
* Extracts a reference from a child element.
* This should be an
*/
async function extractReference(child, index) {
var ref = {