Written by Roy van Rijn (royvanrijn.com) on
Mar 23, 2022 22: 23: 03
: 0 feedback
Or a clickbait title:
How I grew to change into the arena’s most prolific DJ, the utilization of code.
This week I stumbled across a cool venture: All The Song.
Damien Riehl (programmer/copyright attorney) and Noah Rubin (programmer) determined to generate all that you would possibly per chance perhaps per chance specialise in songs with the major 8 predominant notes (C4,D4,E4,F4,G4,A4,B4 and C5) with length 12. All these songs were ‘freely’ launched beneath the ‘Artistic Commons’ license. Their goal is to forestall copyright claims on melodies.
While staring at their astonishing TED discuss and listening to about the challenges that they needed to generate these songs, my head without lengthen made some connections. They generated all songs of length
satisfactory=8 notes, this amounts to a staggering
n^satisfactory=8^12=68,719,476,736 real songs.
All these songs are 12 notes prolonged and have their very maintain MIDI file which provides even more overhead. The scale of this dataset is astronomical, 1.2TB compressed the utilization of GZIP.
Here’s when I obtained an thought: per chance we can use a de Bruijn sequence for this?
I’ve blogged about those sequences earlier than, sometimes it is some distance an optimum potential to rearrange these N substances into a single sequence so that each and each combination of Okay-length is display veil in the sequence.
Shall we advise if now we have all mixtures of
0,1,2 of length
4 the naïve potential would possibly per chance perhaps per chance be to salvage it:
0000 0001 0002 0010 0011 0012 (etc)
As an different when environment up a de Bruijn now we have:
Every that you would possibly per chance perhaps per chance specialise in 4-length combination/permutation is display veil in this single line (check them!).
What if lets remix every that you would possibly per chance perhaps per chance specialise in 12 train melody into one astronomical megamix!?
That would mean I’m mathematically the arena’s handiest DJ, remixing practically all existing songs at the side of EVERY music from the All The Song dataset into one music.
I if truth be told have already obtained some very ambiance profitable code to generate these sequences. What if I output the sequence as a single MIDI file?
Because a de Bruijn sequence on the total wraps spherical, if we desire to create all
n=12 length melodies we’ll desire to append the principle
n-1 notes to the stop of the sequence (which I’ve performed above besides). This potential we’ll want factual a single MIDI file with
68,719,476,747 real notes in it.
This gave me a microscopic downside: a MIDI file has a ‘LENGTH’ enviornment saved in factual 4-bytes. And
2^32 is finest
4,294,967,295. So we’ve hit a technical downside, we can’t match our remix into a single MIDI file.
To resolve this I made up my mind to sever the one music up into a sequence of ‘smaller’ more managable songs. Within the tip I settled on
2052 real songs that create one astronomical megamix album. On this album is each music that you would possibly per chance perhaps per chance specialise in with notes C4,D4,E4,F4,G4,A4,B4 and C5 of length 12. The identical as is contained in the ATM’s dataset.
When breaking apart a de Bruijn sequence, each and each unique music has to repeat the very finest
n-1 notes of the previous music, that means each and each melody is contained in tubby. Shall we advise if we split the above sequence into two substances we’ll desire to salvage:
Song 1: 000012200210002212021211212222011221022211012 Song 2: 101001011112001102111002012010202202
This finally ends up in the next:
- 1 remix album: debruijn8-12.tar
- Size: 16.735.957.504 bytes (16,75 GB on disk)
- 2052 GZIP-ed MIDI songs
- 2051 songs with a 33,500,000 train melody
- 1 music with a 10,999,308 train melody
Let’s rob trace to some songs which would possibly per chance be in the dataset (someplace):
Instance 1, Twinkle Twinkle:
Instance 2, Jingle Bells:
Instance 3, Can You Surely feel The Treasure Tonight.
All that you would possibly per chance perhaps per chance specialise in 12 train melodies are in the remix.
This obtained me pondering, why didn’t Damien and Noah drag for this manner? It is some distance some distance smaller and sooner to generate (in a single morning).
So I grew to change into to Twitter and asked Damien Riehl!
And determined sufficient: his answer makes whole sense:
We had first and predominant regarded as a “de Bruijn” sequence. But when we had been to use a single file, that can perhaps per chance have down facets:
If somebody infringes our work, it would finest be a microscopic share (0.0000000001%?) of the “work” — so somebody would argue “ravishing use”
Identical thought with others incorporating ATM works in theirs (“microscopic share”)
So our technical/acceptable assemble is “One MIDI file per melody” — which I specialise in is a acceptable goal, no longer a bug. 🙂
For sure I must aloof have known there became a profitable cause. He inspired me to continue even if and so I generated my maintain
de Bruijn album. Now I will have the ability to advise I’ve formally remixed
68,719,476,736 songs. Is there love a Guiness E book of World Files entry for me now?
When you happen to’re unfamiliar what this remix sounds love, right here’s a snippet:
It became a fun exercise! I if truth be told adore de Bruijn sequences and realized quite a bit about streaming GZIP/File API’s to without issues retailer the whole lot (producing the sequence first isn’t an option).
The album is, for now, finest saved on my interesting disk, but I’m working with Damien to salvage the songs added to their ATM sequence on the Web Archive.
Oh, and also you would possibly per chance perhaps per chance’t have a remix album with out a factual album duvet: