Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

The Idea

A little while ago, my friend and fellow Wikipedia editor (he’s the Wikipedian in Residence at the British Library!) mentioned to me that Wikipedia could do with more sound files. We discussed recordings of music, industrial and everyday sounds (what does a printing press sound like? Or a Volkswagen Beetle? What do different kinds of breakfast cereal sound like when milk is added?), as well as people’s voices, so that we have a record of what they sound like.

A giant ear-trumpet

Beethoven’s Trumpet (With Ear) By John Baldessari, at the Saatchi Gallery.
Photo by Jim Linwood, on Flickr, CC-BY

In the spirit of Wikipedia, all such recordings would be open-licensed, to allow others to use them, freely. They can then be uploaded to Wikimedia Commons (the media repository for Wikipedia and its related projects) in an open format, namely Ogg Vorbis (that’s like mp3, but without patent encumbrances).

So I’m working on a new initiative to provide short (under ten-second) open-licensed audio clips of examples of the speaking voices of notable people (i.e. people who have Wikipedia articles about them).

What To Do

As a pilot, I’m asking some of my (cough) celebrity friends to kindly record the following, or a variation of their choice, with no background noise:

Hello, my name is [name]. I was born in [place] and I have been [job or position] since [year]

(but without mentioning Wikipedia!) They can do that, in quiet room, with a modern mobile phone, or a computer.

[Stop Press: See update 4, below, for update regarding use of “Vocaroo”, to avoid this step]

Once they’ve done that, they can convert the file to Ogg Vorbis using this free tool and then upload it to Wikimedia Commons, with an open-licence, with no “non-commercial (NC)” or “no derivatives (ND)” restrictions, (e.g. CC-By or CC-By-SA), and add the category “Voice intro project”.

If that’s too much fuss, they can e-mail it, or its URL, to me (, using common file formats like mp3 or .wav, stating that it’s under one of those licences, and CC the mail to: to formally record the open licence. Then I or other Wikipedia editors will make the conversion.

Alternatively, perhaps, they can point to a suitable, open-licensed, example of their speaking voice, which is already online.

Anyone Can Help

If you’re not the subject of a Wikipedia article, you can still help, by recording and uploading to Wikimedia Commons audio files, as described above, of machinery or everyday activities and occurrences.


  1. A couple of Wikipedia article subjects have asked why they would do this. In short, so that there is a public — and freely reusable — record of what they sound like, for current and future generations. And so that we know how they pronounce their names.
  2. The uploaded files are now gathered in a Wikimedia Commons category. Thank you to the early contributors.
  3. I’ve been asked about multi-lingual recordings. The best thing would be separate files, one in each language, please.
  4. If you have a microphone on your computer (doesn’t work on iPhone/iPad), it’s possible to record directly into the Vocaroo website, and just email or tweet me a link. But you still need to agree to an open licence!

About Andy Mabbett

Enjoying my freelance career, helping organisations to understand on-line communities, open content, and related issues; often as a Wikimedian (or Wikipedian) in Residence.
This entry was posted in ideas, open data, Wikipedia and tagged , , , , , , , , , , , . Bookmark the permalink.

39 Responses to Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

  1. Andy Mabbett says:

    Thanks to Jon Bounds for being the first to respond. His sound file is now on Wikimedia Commons; and used on his . Who’s next?

  2. Yuvi says:

    In a larger context, I wonder if an Android application that lets you upload sounds to Commons without the conversion hassle would be a good idea…

  3. Bill Thompson says:

    I’m in… – a great initiative, one that deserves widespread publicity.


  4. Jo Brodie says:

    This is a brilliant idea, I hope it’s widely adopted. I also like Yuvi’s idea very much – a similar-ish sort of thing is the new Radiolab app which allows you to record a tiny clip of you reading a bit of the end-credits (the name of the podcast’s sponsor) which I’m itching to try but always get stagefright whenever I press record, haha.

  5. Hi Andy

    Our thoughts have recently been turning to similar matters. We’ve been working on a project for BBC World Service to take 70,000 English-language programmes and somehow make them available on the web. The big problem is that whilst we have high quality audio files we have no descriptive data about them. So nothing about the subject matter discussed or who’s in them and in some cases not even when they were broadcast.
    To fix this we’ve put the audio through a speech to text system which gives us a (very) rough transcript. We’ve then entity extracted the text against DBpedia / Wikipedia concepts to make some navigation by “tags”. Because the speech to text step is noisy some of the tags extracted are not accurate but we’re working with the World Service Global Minds panel (a community of World Service listeners) who are helping us to correct them.
    Recent work has been around voice recognition from the same audio files. We’re able to recognise voice patterns but not give a name / identity to the person speaking. Again the Global Minds panel are helping us to put names to these voices.
    Obviously it would be better if we could associate these names with Wikipedia / DBpedia concepts to surface programmes about *and* featuring person X.
    One suggestion was to compare the audio to speech recordings on Wikimedia. If we found a match we could associate the voice in the World Service archive with it’s Wikimedia identifier and from there to Wikipedia and from there to DBpedia.
    To do that we’d need longer (duration) and higher quality samples than suggested here and we’ve mentioned cultural bodies (BBC, BL etc) opening up speech snippets from their archives. By releasing small nuggets of their archives they’d be putting just enough in place to make the further contextualisation of their (and other) archives possible which feels like a good trade. As ever there are probably rights issues…
    @bilt – would anything roughly like this fall under your job description? 🙂

  6. Hi Andy

    The World Service archive would indeed be a candidate for a complete crowd-sourced approach (with the usual caveats around rights) but the research goals of the project I’m working on are about finding a sweet spot between first-pass machine processing and community correction. Have just posted more over here.

    Matching against our own voice archive would only really be useful if we could get from that match to wiki/dbpedia identifiers so would only work if we added them to wikimedia under an open licence.

    I’m not the best person to ask about what’s in the archive but if you’ve a genuine interest in Horace, Arnold or Thomas I know the perfect person to ask 🙂

  7. Jo Brodie says:

    Template suggestion thingy…

    When I edited the page for Terry George’s Whole Lotta Sole I discovered there was such a thing as a ‘films made by Terry George’ template thing that I could add. When I then amended the template itself to add in WLS and another of his films I was pleased that it all updated itself nicely.

    Similarly because I keep my eyes on that page I am aware when someone has inserted a category or something like that, even if nothing much changes on the page. So people are aware of quite subtle changes. If I write [[text]] around something it redlinks until that page is created at which point it automatically sorts itself out, the link is already in place.

    At the moment when people look at pages of notable folk there is no indication that anything’s missing in terms of the lack of voice recording.

    Could there be a template for the voice recording that (a) will automatically pick up the formatted recording that’s added to the Wikimedia page and (b) when the template is added to a page won’t show up on the final page (until the sound file is installed) but will make page-watchers go ‘hang on a minute, what’s this clever notion then’?

    All the better if the template can also link to the voice recordings page with the link to your blog on it for instructions.

    Then we can add these templates to notable people, it won’t affect the page (so hopefully no-one will mind it sort of sitting there, waiting to be activated) and it raises awareness of the project.

    Might be impractical though, like all my brilliant ideas 🙂


  8. Tony Souter says:

    Ten seconds is pretty short. Any reason for the limit?

    • Andy Mabbett says:

      It’s not a limit! That’s how long it takes to recite the sample script, which is designed to not be onerous, and to be long enough for the listener to get an impression of what the subject sounds like. But if people want to say more, they can.

  9. Pingback: A late answer to a question from the digital humanities conference | Smethurst

  10. Brian McNeil says:

    Andy, you’re being under-ambitious 😉

    The script is nice and simple; sufficiently so to allow someone to repeat until they’re happy with it. They might as-well have recorded the video too. And, I’ve a possible solution to the conversion fiddling from the Wikinews Paralympics project.

  11. Pingback: Recording the voices of Wikipedia | Wikimedia UK Blog

  12. Andy Mabbett says:

    Just a note that I’ve added an update, number 4, about using Vocaroo, which removes the need for converting and transmitting files – just record at that website, and send me a link.

  13. Pingback: Flying pigs gather interesting sounds | Culham Research Group

  14. Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | OpenGLAM

  15. Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | Open Knowledge Foundation Blog

  16. Pingback: Speakerthon: Sharing Voice Samples | Open Education Working Group

  17. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations

  18. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generationsSOVIDERS TECH | SOVIDERS TECH

  19. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations |

  20. Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | Páginas Mendocinas

  21. Pingback: Wikipedia Adding Coice Recordings to Famous People’s Bio Pages - Takes On Tech | Takes On Tech

  22. Pingback: Wikipedia will begin storing celebrities’ voices on their pages | | Website Design NZ

  23. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Technics and Time

  24. Pingback: 위키백과, 유명인 목소리도 기록한다 | All that Cuteness

  25. Pingback: Wikipedia ajoute l’enregistrement vocal pour les pages de personnes célèbres | BlogNT

  26. Gregory Kohs says:

    Andy, as the founder of MyWikiBiz (and with Wikipedia link “Gregory Kohs” redirecting to MyWikiBiz article on Wikipedia), would I be welcomed to add a voice recording identifying myself? Are users banned on English Wikipedia (but active and welcome on other WMF projects) permitted to participate? Is a general rule being applied for “corporate” topics having, for example, a company founder identify the company by voice?

    • Andy Mabbett says:

      Hi Gregory, The files are uploaded to Wikimedia Commons, not en.Wikipedia, so that’s the first hurdle dealt with. As for adding one to an article, I see it as no different to adding a picture. Do note, though the request for non-controversial content, and please keep your comments neutral, in line with the suggested script and existing examples.

  27. Pingback: Wikipedia project aims to enshrine celebrity voice recordings »

  28. Pingback: Wikipedia project aims to enshrine celebrity voice recordings » Borg Prime

  29. Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | el BLOG de FCASTROG

  30. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Daily Tech News

  31. Pingback: Introducción por voz en Wikipedia: qué es y cómo participar

  32. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations - The Headlines Now - Live News India, World, Business, Technology, Sports, Fashion, LifeStyle & Entertainment

  33. Pingback: Wikipedia wants celebrity voice recordings, to remove that boredom of just reading | techjaja

  34. Pingback: Wikimania Free Culture Weekend | Wikimedia UK Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

e.g. 0000-0002-7299-680X