Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

The Idea

A little while ago, my friend and fellow Wikipedia editor Andrew Gray (he’s the Wikipedian in Residence at the British Library!) mentioned to me that Wikipedia could do with more sound files. We discussed recordings of music, industrial and everyday sounds (what does a printing press sound like? Or a Volkswagen Beetle? What do different kinds of breakfast cereal sound like when milk is added?), as well as people’s voices, so that we have a record of what they sound like.

Beethoven’s Trumpet (With Ear) By John Baldessari, at the Saatchi Gallery.
Photo by Jim Linwood, on Flickr, CC-BY

In the spirit of Wikipedia, all such recordings would be open-licensed, to allow others to use them, freely. They can then be uploaded to Wikimedia Commons (the media repository for Wikipedia and its related projects) in an open format, namely Ogg Vorbis (that’s like mp3, but without patent encumbrances).

So I’m working on a new initiative to provide short (under ten-second) open-licensed audio clips of examples of the speaking voices of notable people (i.e. people who have Wikipedia articles about them).

What To Do

As a pilot, I’m asking some of my (cough) celebrity friends to kindly record the following, or a variation of their choice, with no background noise:

Hello, my name is [name]. I was born in [place] and I have been [job or position] since [year]

(but without mentioning Wikipedia!) They can do that, in quiet room, with a modern mobile phone, or a computer.

[Stop Press: See update 4, below, for update regarding use of “Vocaroo”, to avoid this step]

Once they’ve done that, they can convert the file to Ogg Vorbis using this free tool and then upload it to Wikimedia Commons, with an open-licence, with no “non-commercial (NC)” or “no derivatives (ND)” restrictions, (e.g. CC-By or CC-By-SA), and add the category “Voice intro project”.

If that’s too much fuss, they can e-mail it, or its URL, to me (andy@pigsonthewing.org.uk), using common file formats like mp3 or .wav, stating that it’s under one of those licences, and CC the mail to: permissions-en@wikimedia.org to formally record the open licence. Then I or other Wikipedia editors will make the conversion.

Alternatively, perhaps, they can point to a suitable, open-licensed, example of their speaking voice, which is already online.

Anyone Can Help

If you’re not the subject of a Wikipedia article, you can still help, by recording and uploading to Wikimedia Commons audio files, as described above, of machinery or everyday activities and occurrences.

Updates

A couple of Wikipedia article subjects have asked why they would do this. In short, so that there is a public — and freely reusable — record of what they sound like, for current and future generations. And so that we know how they pronounce their names.
The uploaded files are now gathered in a Wikimedia Commons category. Thank you to the early contributors.
I’ve been asked about multi-lingual recordings. The best thing would be separate files, one in each language, please.
If you have a microphone on your computer (doesn’t work on iPhone/iPad), it’s possible to record directly into the Vocaroo website, and just email or tweet me a link. But you still need to agree to an open licence!

39 thoughts on “Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons”

Andy Mabbett Post author24 October 2012 at 23:11

0000-0001-5882-6823

Thanks to Jon Bounds for being the first to respond. His sound file is now on Wikimedia Commons; and used on his Wikipedia biography. Who’s next?

Reply ↓
Yuvi 25 October 2012 at 20:10

In a larger context, I wonder if an Android application that lets you upload sounds to Commons without the conversion hassle would be a good idea…

Reply ↓
Bill Thompson 25 October 2012 at 20:39

I’m in… http://commons.wikimedia.org/wiki/File:Bill_Thompson_speaking.ogg – a great initiative, one that deserves widespread publicity.

B

Reply ↓
Jo Brodie 2 November 2012 at 00:38

This is a brilliant idea, I hope it’s widely adopted. I also like Yuvi’s idea very much – a similar-ish sort of thing is the new Radiolab app which allows you to record a tiny clip of you reading a bit of the end-credits (the name of the podcast’s sponsor) which I’m itching to try but always get stagefright whenever I press record, haha.

Reply ↓
Michael Smethurst 2 November 2012 at 15:29

Hi Andy

Our thoughts have recently been turning to similar matters. We’ve been working on a project for BBC World Service to take 70,000 English-language programmes and somehow make them available on the web. The big problem is that whilst we have high quality audio files we have no descriptive data about them. So nothing about the subject matter discussed or who’s in them and in some cases not even when they were broadcast.
To fix this we’ve put the audio through a speech to text system which gives us a (very) rough transcript. We’ve then entity extracted the text against DBpedia / Wikipedia concepts to make some navigation by “tags”. Because the speech to text step is noisy some of the tags extracted are not accurate but we’re working with the World Service Global Minds panel (a community of World Service listeners) who are helping us to correct them.
Recent work has been around voice recognition from the same audio files. We’re able to recognise voice patterns but not give a name / identity to the person speaking. Again the Global Minds panel are helping us to put names to these voices.
Obviously it would be better if we could associate these names with Wikipedia / DBpedia concepts to surface programmes about *and* featuring person X.
One suggestion was to compare the audio to speech recordings on Wikimedia. If we found a match we could associate the voice in the World Service archive with it’s Wikimedia identifier and from there to Wikipedia and from there to DBpedia.
To do that we’d need longer (duration) and higher quality samples than suggested here and we’ve mentioned cultural bodies (BBC, BL etc) opening up speech snippets from their archives. By releasing small nuggets of their archives they’d be putting just enough in place to make the further contextualisation of their (and other) archives possible which feels like a good trade. As ever there are probably rights issues…
@bilt – would anything roughly like this fall under your job description? 🙂

Reply ↓
1. Andy Mabbett Post author2 November 2012 at 16:04
  
  0000-0001-5882-6823
  
  Interesting ideas, Michael, and lots to think about. The project sounds ripe for crowd sourcing; and that could be facilitated by open-licensing some of your recordings. They could then be uploaded to Wikimedia Commons, tagged or categorised there, and you could then reimport the metadata — I have in mind a similar initiative with old photographs from the US National Archives, which worked this way.
  
  Of course, the BBC also has its own collection of voice recordings from named people, against which you could match — and those, or samples from them, would also be useful under an open licence.
  
  I wonder if you have anything by Horace Alexander, Arnold Boyd, Thomas Coward, or other early twentieth-century ornithologists?
  
  Reply ↓
Michael Smethurst 3 November 2012 at 20:57

Hi Andy

The World Service archive would indeed be a candidate for a complete crowd-sourced approach (with the usual caveats around rights) but the research goals of the project I’m working on are about finding a sweet spot between first-pass machine processing and community correction. Have just posted more over here.

Matching against our own voice archive would only really be useful if we could get from that match to wiki/dbpedia identifiers so would only work if we added them to wikimedia under an open licence.

I’m not the best person to ask about what’s in the archive but if you’ve a genuine interest in Horace, Arnold or Thomas I know the perfect person to ask 🙂

Reply ↓
Jo Brodie 21 November 2012 at 02:19

Template suggestion thingy…

When I edited the page for Terry George’s Whole Lotta Sole I discovered there was such a thing as a ‘films made by Terry George’ template thing that I could add. When I then amended the template itself to add in WLS and another of his films I was pleased that it all updated itself nicely.

Similarly because I keep my eyes on that page I am aware when someone has inserted a category or something like that, even if nothing much changes on the page. So people are aware of quite subtle changes. If I write [[text]] around something it redlinks until that page is created at which point it automatically sorts itself out, the link is already in place.

At the moment when people look at pages of notable folk there is no indication that anything’s missing in terms of the lack of voice recording.

Could there be a template for the voice recording that (a) will automatically pick up the formatted recording that’s added to the Wikimedia page and (b) when the template is added to a page won’t show up on the final page (until the sound file is installed) but will make page-watchers go ‘hang on a minute, what’s this clever notion then’?

All the better if the template can also link to the voice recordings page with the link to your blog on it for instructions.

Then we can add these templates to notable people, it won’t affect the page (so hopefully no-one will mind it sort of sitting there, waiting to be activated) and it raises awareness of the project.

Might be impractical though, like all my brilliant ideas 🙂

Jo

Reply ↓
Tony Souter 21 November 2012 at 11:30

Ten seconds is pretty short. Any reason for the limit?

Reply ↓
1. Andy Mabbett Post author21 November 2012 at 21:26
  
  0000-0001-5882-6823
  
  It’s not a limit! That’s how long it takes to recite the sample script, which is designed to not be onerous, and to be long enough for the listener to get an impression of what the subject sounds like. But if people want to say more, they can.
  
  Reply ↓
Pingback: A late answer to a question from the digital humanities conference | Smethurst
Brian McNeil 10 August 2013 at 07:44

Andy, you’re being under-ambitious 😉

The script is nice and simple; sufficiently so to allow someone to repeat until they’re happy with it. They might as-well have recorded the video too. And, I’ve a possible solution to the conversion fiddling from the Wikinews Paralympics project.

Reply ↓
Pingback: Recording the voices of Wikipedia | Wikimedia UK Blog
Andy Mabbett Post author25 September 2013 at 21:32

0000-0001-5882-6823

Just a note that I’ve added an update, number 4, about using Vocaroo, which removes the need for converting and transmitting files – just record at that website, and send me a link.

Reply ↓
Pingback: Flying pigs gather interesting sounds | Culham Research Group
Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | OpenGLAM
Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | Open Knowledge Foundation Blog
Pingback: Speakerthon: Sharing Voice Samples | Open Education Working Group
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generationsSOVIDERS TECH | SOVIDERS TECH
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations |
Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | Páginas Mendocinas
Pingback: Wikipedia Adding Coice Recordings to Famous People’s Bio Pages - Takes On Tech | Takes On Tech
Pingback: Wikipedia will begin storing celebrities’ voices on their pages | tozandelman.co.nz | Website Design NZ
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Technics and Time
Pingback: 위키백과, 유명인 목소리도 기록한다 | All that Cuteness
Pingback: Wikipedia ajoute l’enregistrement vocal pour les pages de personnes célèbres | BlogNT
Gregory Kohs 27 January 2014 at 19:11

Andy, as the founder of MyWikiBiz (and with Wikipedia link “Gregory Kohs” redirecting to MyWikiBiz article on Wikipedia), would I be welcomed to add a voice recording identifying myself? Are users banned on English Wikipedia (but active and welcome on other WMF projects) permitted to participate? Is a general rule being applied for “corporate” topics having, for example, a company founder identify the company by voice?

Reply ↓
1. Andy Mabbett Post author28 January 2014 at 15:24
  
  0000-0001-5882-6823
  
  Hi Gregory, The files are uploaded to Wikimedia Commons, not en.Wikipedia, so that’s the first hurdle dealt with. As for adding one to an article, I see it as no different to adding a picture. Do note, though the request for non-controversial content, and please keep your comments neutral, in line with the suggested script and existing examples.
  
  Reply ↓
  1. Gregory Kohs 28 January 2014 at 17:10
    
    Thanks for the info, Andy. I’m a welcome user on Commons, and I’m actually quite capable of making an audio introduction about myself without foaming at the mouth — 😉 — so, I’ll give this a try in a little while. Obviously, adding the clip to the English Wikipedia would have to be done by someone who’s willing to carry the burden of “proxying for a banned user” accusations that will surely fly… but there are no shortage of drama mongers on Wikipedia would probably love to test this out.
    
    Reply ↓
  2. Gregory Kohs 28 January 2014 at 18:38
    
    Bam! https://commons.wikimedia.org/wiki/File:Kohs-MyWikiBiz_VIP.ogg
    
    Reply ↓
Pingback: Wikipedia project aims to enshrine celebrity voice recordings » myhavens.com
Pingback: Wikipedia project aims to enshrine celebrity voice recordings » Borg Prime
Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | el BLOG de FCASTROG
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Daily Tech News
Pingback: Introducción por voz en Wikipedia: qué es y cómo participar
Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations - The Headlines Now - Live News India, World, Business, Technology, Sports, Fashion, LifeStyle & Entertainment
Pingback: Wikipedia wants celebrity voice recordings, to remove that boredom of just reading | techjaja
Pingback: Wikimania Free Culture Weekend | Wikimedia UK Blog