About AudioBoom and Archive.org

As seen earlier in my AudioBoom account

After seeing (then hearing) a public Facebook post by Documentally just earlier regarding some (inevitable) changes that AudioBoom (previously known as ‘audioboo’) are enforcing on their loyal users (ie: “pay up”), I thought I’d try something I haven’t tried for years, which is to upload an mp3 file to Archive.org.

Turns out it was very easy.

Using cUrl:

curl --location --header 'x-amz-auto-make-bucket:1' \
 --header 'x-archive-meta01-collection:test_collection' \
 --header 'x-archive-meta-mediatype:audio' \
 --header 'x-archive-meta-title:The Wilhelm Scream' \
 --header "authorization: LOW ##my-access-key-##:##my-secret-key-##" \
 --upload-file wilhelm.mp3 \

Which now gives us:


and even


The ‘Item’ (aka bucket) is called kosso-tests-things. That was created because I specified  x-amz-auto-make-bucket:1 and defined it in the actual url location at the end as the folder name. I also changed the filename by using a different name in the url location.

If I wanted to upload another File to that ‘item’, I would just use:

curl --location --header 'x-archive-meta01-collection:test_collection' \
 --header 'x-archive-meta-mediatype:audio' \
 --header 'x-archive-meta-title:Another Audio File Test' \
 --header "authorization: LOW ##my-access-key-##:##my-secret-key-##" \
 --upload-file letsgo.mp3 \

The two audio files can now also be viewed and played on their page on archive.org here:



You can get an XML file which lists all the files in this ‘Item’ using:


A JSON format is provided here : (note ‘metadata’, not ‘details’ like the web url)



It would be pretty easy for anyone to script a way to grab all your AudioBoom files and dump them in (what’s called) an ‘Item’ on archive.org. (Amazon S3 calls Items ‘buckets’)

According to their docs, ‘Items’ should not be over 100Gb. And should not contain over 10,000 files. So i think one would do! 😉 You should also be able script the attachment of the original metadata (locations, etc. even comments and text.).

Their system auto automatically creates audio waveforms for players and ogg audio ‘derivative’ conversions. http://s3.us.archive.org/kosso-tests-things/wilhelm-scream.ogg and http://s3.us.archive.org/kosso-tests-things/wilhelm-scream.png

All the script would need is the links to your AudioBoom files (and their metadata) and also you’d need to have an account on Archive org and get your special API ‘keys’. (via: http://archive.org/account/s3.php if logged in)

eg: Here’s a url with a JSON list of my audio uploads to AudioBoo(m) : https://api.audioboom.com/audio_clips?username=kosso 

Also no reason why that script shouldn’t also generate an RSS feed of the files (or last however many) and overwrite it each time a new file was added to the ‘Item’.

Archive.org also has a Python library which will also make life incredibly easy to script all this up, if snakes are your thing.

More docs for the “Internet Archive’s S3-like server API, aka ‘IAS3.'” here : https://github.com/vmbrasseur/IAS3API

I may have a crack at building a simple PHP and JavaScript powered WordPress plugin or something to do something interesting with their API too.

NOTE: The files mentioned in these examples will be deleted after 30 days or so, since they are in the ‘test_collection’ Collection.

UPDATE 1: I had a go at coding up a JavaScript-based solution. Thinking I could get it to a) list the booms b) download a file from AB c) Upload to AO.

You can see how far I got here, which will list any Audioboom user’s last 20 posts and allow you to play them. But that’s as far as I can get.

Alas, AudioBoom don’t have their CloudFont S3 storage set up to allow for this in their CORS (Cross Origin Resource Sharing) setting. Meaning that a server script (PHP, Python, etc.) would have to do the job of downloading, acting as a ‘proxy’, since the browser won’t let the JavaScript do it directly.

Or, they could add these settings to their CloudFront CORS setup: (I think!)

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

Python is definitely the way to go.


Here’s a Python script I’ve just knocked up which will (to start with) download all your Audiobooms, including all the metadata (recorded date, original upload date, location, etc.) and photo and waveform images if they exist.



It shouldn’t take too much to set it to upload to Archive.org now, with all the data and photos too. I would upload to the opensource_audio collection. Set the item name to something like username-audioboos and be sure to initialise the Item metadata before kicking off the uploads.

For some reason I found that the Archive.org API lets you set any custom metadata to an ‘Item’. But not to each of the arbitrary files within in the Item. So the script makes a copy of the original JSON from Audioboom and uploads that for each recording.

As long as the initial Item is set up with the mediatype:audio metadata, the main Archive.org page should show the audio files in their web player once uploaded. (I think). The photos and JSON files will be there, but you won’t see them on the Item page or player.

So, once all the files are there, Archive.org’s handy JSON or XML list files could then be used by any other application someone might want to build to read and then offer a player or do what they like with. And it’s just a matter of reading the JSON file with it (with the same filename) to parse out all the data from when it was originally uploaded to Audioboom.

Easy! 🙂

ps: I think the massive bonus of Archive.org is that they also create an automatic Torrent file to download all or any of the files in the ‘Item’. Neat.