Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

The Idea

A little while ago, my friend and fellow Wikipedia editor (he’s the Wikipedian in Residence at the British Library!) mentioned to me that Wikipedia could do with more sound files. We discussed recordings of music, industrial and everyday sounds (what does a printing press sound like? Or a Volkswagen Beetle? What do different kinds of breakfast cereal sound like when milk is added?), as well as people’s voices, so that we have a record of what they sound like.

A giant ear-trumpet

Beethoven’s Trumpet (With Ear) By John Baldessari, at the Saatchi Gallery.
Photo by Jim Linwood, on Flickr, CC-BY

In the spirit of Wikipedia, all such recordings would be open-licensed, to allow others to use them, freely. They can then be uploaded to Wikimedia Commons (the media repository for Wikipedia and its related projects) in an open format, namely Ogg Vorbis (that’s like mp3, but without patent encumbrances).

So I’m working on a new initiative to provide short (under ten-second) open-licensed audio clips of examples of the speaking voices of notable people (i.e. people who have Wikipedia articles about them).

What To Do

As a pilot, I’m asking some of my (cough) celebrity friends to kindly record the following, or a variation of their choice, with no background noise:

Hello, my name is [name]. I was born in [place] and I have been [job or position] since [year]

(but without mentioning Wikipedia!) They can do that, in quiet room, with a modern mobile phone, or a computer.

[Stop Press: See update 4, below, for update regarding use of “Vocaroo”, to avoid this step]

Once they’ve done that, they can convert the file to Ogg Vorbis using this free tool and then upload it to Wikimedia Commons, with an open-licence, with no “non-commercial (NC)” or “no derivatives (ND)” restrictions, (e.g. CC-By or CC-By-SA), and add the category “Voice intro project”.

If that’s too much fuss, they can e-mail it, or its URL, to me (, using common file formats like mp3 or .wav, stating that it’s under one of those licences, and CC the mail to: to formally record the open licence. Then I or other Wikipedia editors will make the conversion.

Alternatively, perhaps, they can point to a suitable, open-licensed, example of their speaking voice, which is already online.

Anyone Can Help

If you’re not the subject of a Wikipedia article, you can still help, by recording and uploading to Wikimedia Commons audio files, as described above, of machinery or everyday activities and occurrences.


  1. A couple of Wikipedia article subjects have asked why they would do this. In short, so that there is a public — and freely reusable — record of what they sound like, for current and future generations. And so that we know how they pronounce their names.
  2. The uploaded files are now gathered in a Wikimedia Commons category. Thank you to the early contributors.
  3. I’ve been asked about multi-lingual recordings. The best thing would be separate files, one in each language, please.
  4. If you have a microphone on your computer (doesn’t work on iPhone/iPad), it’s possible to record directly into the Vocaroo website, and just email or tweet me a link. But you still need to agree to an open licence!

39 thoughts on “Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

  1. Yuvi

    In a larger context, I wonder if an Android application that lets you upload sounds to Commons without the conversion hassle would be a good idea…

  2. Jo Brodie

    This is a brilliant idea, I hope it’s widely adopted. I also like Yuvi’s idea very much – a similar-ish sort of thing is the new Radiolab app which allows you to record a tiny clip of you reading a bit of the end-credits (the name of the podcast’s sponsor) which I’m itching to try but always get stagefright whenever I press record, haha.

  3. Michael Smethurst

    Hi Andy

    Our thoughts have recently been turning to similar matters. We’ve been working on a project for BBC World Service to take 70,000 English-language programmes and somehow make them available on the web. The big problem is that whilst we have high quality audio files we have no descriptive data about them. So nothing about the subject matter discussed or who’s in them and in some cases not even when they were broadcast.
    To fix this we’ve put the audio through a speech to text system which gives us a (very) rough transcript. We’ve then entity extracted the text against DBpedia / Wikipedia concepts to make some navigation by “tags”. Because the speech to text step is noisy some of the tags extracted are not accurate but we’re working with the World Service Global Minds panel (a community of World Service listeners) who are helping us to correct them.
    Recent work has been around voice recognition from the same audio files. We’re able to recognise voice patterns but not give a name / identity to the person speaking. Again the Global Minds panel are helping us to put names to these voices.
    Obviously it would be better if we could associate these names with Wikipedia / DBpedia concepts to surface programmes about *and* featuring person X.
    One suggestion was to compare the audio to speech recordings on Wikimedia. If we found a match we could associate the voice in the World Service archive with it’s Wikimedia identifier and from there to Wikipedia and from there to DBpedia.
    To do that we’d need longer (duration) and higher quality samples than suggested here and we’ve mentioned cultural bodies (BBC, BL etc) opening up speech snippets from their archives. By releasing small nuggets of their archives they’d be putting just enough in place to make the further contextualisation of their (and other) archives possible which feels like a good trade. As ever there are probably rights issues…
    @bilt – would anything roughly like this fall under your job description? 🙂

    1. Andy Mabbett Post author

      Interesting ideas, Michael, and lots to think about. The project sounds ripe for crowd sourcing; and that could be facilitated by open-licensing some of your recordings. They could then be uploaded to Wikimedia Commons, tagged or categorised there, and you could then reimport the metadata — I have in mind a similar initiative with old photographs from the US National Archives, which worked this way.

      Of course, the BBC also has its own collection of voice recordings from named people, against which you could match — and those, or samples from them, would also be useful under an open licence.

      I wonder if you have anything by , , , or other early twentieth-century ornithologists?

  4. Michael Smethurst

    Hi Andy

    The World Service archive would indeed be a candidate for a complete crowd-sourced approach (with the usual caveats around rights) but the research goals of the project I’m working on are about finding a sweet spot between first-pass machine processing and community correction. Have just posted more over here.

    Matching against our own voice archive would only really be useful if we could get from that match to wiki/dbpedia identifiers so would only work if we added them to wikimedia under an open licence.

    I’m not the best person to ask about what’s in the archive but if you’ve a genuine interest in Horace, Arnold or Thomas I know the perfect person to ask 🙂

  5. Jo Brodie

    Template suggestion thingy…

    When I edited the page for Terry George’s Whole Lotta Sole I discovered there was such a thing as a ‘films made by Terry George’ template thing that I could add. When I then amended the template itself to add in WLS and another of his films I was pleased that it all updated itself nicely.

    Similarly because I keep my eyes on that page I am aware when someone has inserted a category or something like that, even if nothing much changes on the page. So people are aware of quite subtle changes. If I write [[text]] around something it redlinks until that page is created at which point it automatically sorts itself out, the link is already in place.

    At the moment when people look at pages of notable folk there is no indication that anything’s missing in terms of the lack of voice recording.

    Could there be a template for the voice recording that (a) will automatically pick up the formatted recording that’s added to the Wikimedia page and (b) when the template is added to a page won’t show up on the final page (until the sound file is installed) but will make page-watchers go ‘hang on a minute, what’s this clever notion then’?

    All the better if the template can also link to the voice recordings page with the link to your blog on it for instructions.

    Then we can add these templates to notable people, it won’t affect the page (so hopefully no-one will mind it sort of sitting there, waiting to be activated) and it raises awareness of the project.

    Might be impractical though, like all my brilliant ideas 🙂


    1. Andy Mabbett Post author

      It’s not a limit! That’s how long it takes to recite the sample script, which is designed to not be onerous, and to be long enough for the listener to get an impression of what the subject sounds like. But if people want to say more, they can.

  6. Pingback: A late answer to a question from the digital humanities conference | Smethurst

  7. Brian McNeil

    Andy, you’re being under-ambitious 😉

    The script is nice and simple; sufficiently so to allow someone to repeat until they’re happy with it. They might as-well have recorded the video too. And, I’ve a possible solution to the conversion fiddling from the Wikinews Paralympics project.

  8. Pingback: Recording the voices of Wikipedia | Wikimedia UK Blog

  9. Pingback: Flying pigs gather interesting sounds | Culham Research Group

  10. Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | OpenGLAM

  11. Pingback: Help Turn Voices from BBC Radio into Open Data for Wikipedia | Open Knowledge Foundation Blog

  12. Pingback: Speakerthon: Sharing Voice Samples | Open Education Working Group

  13. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations

  14. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generationsSOVIDERS TECH | SOVIDERS TECH

  15. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations |

  16. Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | Páginas Mendocinas

  17. Pingback: Wikipedia Adding Coice Recordings to Famous People’s Bio Pages - Takes On Tech | Takes On Tech

  18. Pingback: Wikipedia will begin storing celebrities’ voices on their pages | | Website Design NZ

  19. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Technics and Time

  20. Pingback: 위키백과, 유명인 목소리도 기록한다 | All that Cuteness

  21. Pingback: Wikipedia ajoute l’enregistrement vocal pour les pages de personnes célèbres | BlogNT

  22. Gregory Kohs

    Andy, as the founder of MyWikiBiz (and with Wikipedia link “Gregory Kohs” redirecting to MyWikiBiz article on Wikipedia), would I be welcomed to add a voice recording identifying myself? Are users banned on English Wikipedia (but active and welcome on other WMF projects) permitted to participate? Is a general rule being applied for “corporate” topics having, for example, a company founder identify the company by voice?

    1. Andy Mabbett Post author

      Hi Gregory, The files are uploaded to Wikimedia Commons, not en.Wikipedia, so that’s the first hurdle dealt with. As for adding one to an article, I see it as no different to adding a picture. Do note, though the request for non-controversial content, and please keep your comments neutral, in line with the suggested script and existing examples.

      1. Gregory Kohs

        Thanks for the info, Andy. I’m a welcome user on Commons, and I’m actually quite capable of making an audio introduction about myself without foaming at the mouth — 😉 — so, I’ll give this a try in a little while. Obviously, adding the clip to the English Wikipedia would have to be done by someone who’s willing to carry the burden of “proxying for a banned user” accusations that will surely fly… but there are no shortage of drama mongers on Wikipedia would probably love to test this out.

  23. Pingback: Wikipedia project aims to enshrine celebrity voice recordings »

  24. Pingback: Wikipedia project aims to enshrine celebrity voice recordings » Borg Prime

  25. Pingback: Proyecto WikiVIP: Wikipedia comienza a incorporar voces de celebridades | el BLOG de FCASTROG

  26. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations | Daily Tech News

  27. Pingback: Introducción por voz en Wikipedia: qué es y cómo participar

  28. Pingback: Wikipedia adding celebrity voices to wiki pages to preserve them for future generations - The Headlines Now - Live News India, World, Business, Technology, Sports, Fashion, LifeStyle & Entertainment

  29. Pingback: Wikipedia wants celebrity voice recordings, to remove that boredom of just reading | techjaja

  30. Pingback: Wikimania Free Culture Weekend | Wikimedia UK Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

e.g. 0000-0002-7299-680X