Category Archives: open data

Developer needed to make Wikidata’s geographical data compatible with GPS tools

Wikidata needs an open-source developer to make its geographical query results compatible with GPS devices and other geo-spatial tools. Here’s why…

If you query Wikidata (the database sibling project of Wikipedia) for geographically locatable subjects (say, a list of accredited museums in the UK) the results are returned in a table.

When the data has coordinates, with a single click (on the left-hand menu, in desktop view) the results can also be displayed on a map.

The tabular data can be downloaded (via the right-hand menu) in a number of formats, such as CSV, HTML or JSON.

The Wikidata community would like users to be able also to export the data in one or more GPS-friendly formats. These are not only useful for GPS devices, but are compatible with other mapping and visualisation tools. I opened a ticket for this feature request—in 2019!

A patch to do this, supporting GPX, GeoJSON and KML, has been coded. However, it relies on a number of libraries, which in turn introduce numerous dependencies on other libraries. Because these libraries all need to be security-checked, and maintained, using the patch would be cost-prohibitive. As a result, it has been declined.

We are told that it should be possible to code the conversions directly, so that the libraries are not needed. Or to look at removing what we do not need from those libraries. This “requires a developer with a bit more understanding of the formats to look into it”.

I’m not a developer, and the nuts-and-bolts of this are mysterious to me.

We need someone with the relevant knowledge and experience, willing to work on an open-source fix, for the common good.

Who will step up and take on this pro-bono work?

United Kingdom parliamentary URL structure: change needed

In Wikidata, Wikipedia’s sister project for storing statements of fact as , we record a number of unique identifiers.

For example, Tim Berners-Lee has the identifier “85312226” and is known to the as “nm3805083”.

We know that we can convert these to URLs by adding a prefix, so

by adding the prefixes:

  • https://viaf.org/viaf/
  • http://www.imdb.com/Name?

respectively. We only need to store those prefixes in Wikidata once each.


HOUSES OF PARLIAMENT DSC 7057 pano 2

The in August 2014,
picture by Henry Kellner, CC BY-SA 3.0

The United Kingdom Parliament website also uses identifiers for MPs and members of the House of Lords.

For example, Tom Watson, an MP, is “1463”, and Jim Knight, aka The Lord Knight of Weymouth, is “4160”.

However, the respective URLs are:

meaning that the prefixes are not consistent, and require you to know the name or exact title.

Yet more ridiculous is that, if Tom Watson ever gets appointed to the House of Lords, even though his unique ID won’t change, the URL required to find his biography on the parliamentary website will change — and, because we don’t know whether he would be, say Lord Watson of Sandwell Valley, or Lord Watson of West Bromwich, we can’t predict what it will be.

When building databases, like Wikidata, this is all extremely unhelpful.

What we would like the parliamentary authorities to do — and what would benefit others wanting to make use of parliamentary URLs — is to use a standard, predictable type of URL, for example http://www.parliament.uk/biographies/1463 which uses the unique identifier, but does not require the individual’s house, name or title, and does not change if they shift to “the other place”.

If necessary they could then make that redirect to the longer URLs they prefer (though I wouldn’t recommend it).

I’ve asked them; but they don’t currently do this. In fact they explained their preference for the longer URLs thus:

…we are unable [sic] to shorten the url any further as the purpose of the current pattern of the web address is to display a pathway to the page.

The url also identifies the page i.e the indication of biographies including the name of the respective Member as to make it informative for online users who may view the page.

I find these arguments unconvincing, to say the least.


Screenshot, with Watson's name in the largest font on the page

There’s a big enough clue on the page, without needing to read the URL to identify its subject

Furthermore, the most verbose parts of the URLs are non-functioning; if we truncate Tom’s URL by simply dropping the final digit: http://www.parliament.uk/biographies/commons/tom-watson/146, then we get the biography of a different MP. On the other hand, if we change it to, say: http://www.parliament.uk/biographies/commons/t/1463, we still get Tom’s page. Try them for yourself.

So, how can we help the people running the Parliamentary website to change their minds, and to use a more helpful URL structure? Who do we need to persuade?

A reply from the UK government to my request for road gritting open data

The government website data.gov.uk, to quote its about page, hosts datasets — as open data — ”from all central government departments and a number of other public sector bodies and local authorities” (my emphasis). This is a good thing.

The site’s FAQ says “If there are particular datasets that you believe should be made available more quickly, please use the data request process” (link in original). This is also good.

Accordingly, in September 2012 (that’s sixteen months ago) I submitted a request asking for:

Lists of roads gritted by councils and other bodies, in times of freezing temperatures, with priorities and criteria if applicable.

I specified that those “other bodies” included the Highways Agency, which is “an Exec­u­tive Agency of the Depart­ment for Trans­port (DfT), and is respon­si­ble for oper­at­ing, main­tain­ing and improv­ing the strate­gic road net­work in Eng­land on behalf of the Sec­re­tary of State for Transport” (again, quoting the HA’s about page) and thus an agency of the UK government. They grit most motorways and certain trunk (“A”) roads.

Highways Agency 1995 Foden Telstar gritter truck, 4 February 2009

A highways Agency gritting vehicle at work

The data would allow my fellow volunteers and I to label (“tag”) gritting routes in OpenStreetMap, improving Satnav routing. Here’s a map of some we’ve already done.

I have, today, received a reply from the Cabinet Office, which I reproduce here in full and verbatim:

Hi Andy,

I am getting in touch with you about your data request. I sincerely apologise for the length of time it has taken to get you a response to your request. Local Authorities are responsible for winter gritting within their boundaries. Local Authorities are data owners and they are responsible for the format, access and cost of their data. This means that you will need to get in touch with the Local Authority’s [sic] who’s [sic] data you are seeking directly for access to their data. Some Local Authorities do publish information about gritting on data.gov.uk, but they do not have a reporting requirement. You may find the following links helpful.

http://data.gov.uk/data/search?q=gritting – Data.gov.uk Local Authorities gritting data
http://www.local.gov.uk/community-safety/-/journal_content/56/10180/3510492/ARTICLE – Local Government Association information on how Local Authorities gritting responsibilities.
https://www.gov.uk/roads-council-will-grit – Access to each local Authority page on gritting
http://www.highways.gov.uk/our-road-network/managing-our-roads/operating-our-network/how-we-manage-our-roads/area-teams/area-9/area-9-our-winter-work/ – Highways Agency information
http://www.highways.gov.uk/about-us/contact-us/ – Contacting the Highways Agency
http://www.highways.gov.uk/freedom-of-information-2/ – information on submitting an Freedom of Information request to the Highways Agency

I am very sorry for the amount of time it has taken us to get back to you. I hope this helps.

Kind Regards,

[name redacted]
Transparency Team
4th Floor
1 Horse Guards Road, London, SW1A 2HQ
Email: [redacted]@cabinet-office.gsi.gov.uk
Find out more about Open Data @ Data.gov.uk

I note the following:

  • Although an apology for the — inordinate — delay in replying is given, no reason for that is offered.
  • It should — surely? — be possible to make one centralised request rather than having to make the same request to every local authority (at the relevant tier) in the country?
  • No mention is made of Highways Agency data, other than links to their web pages, including their FoI page.
  • The reply was sent to me by email, but is not in the Comments section of the page for the request, so is not available to other interested people, including the person who commented in support of it. (I’ll post a link to this post there.)

What do you think?

I hope my recent request, for The National Heritage List for England, receives more prompt consideration and achieves a more positive outcome.

For I’m a Jolly Good Fellow (of the RSA)

I may have been overlooked, once again, in the new year’s honours list, but in mid-December I received an unsolicited and very flattering email; I’d been nominated, by their Regional Programme Manager, to become a Fellow of the Royal Society for the encouragement of Arts, Manufactures and Commerce (the Royal Society of Arts, for short, or RSA, for shorter). The nomination was “for your work on open data, Wikipedia and social media”.

You could have knocked me down with a metaphor.

Royal Society of Arts - from the Strand, London

RSA headquarters
Photo by Elliott Brown, on Flickr, CC-BY

Founded in 1754, the RSA is an independent enlightenment organisation committed to finding practical solutions to today’s social challenges (their email pointed out). That sounded right up my street. I was delighted to accept, and confirmation arrived by e-mail on Wednesday.

I’m in some illustrious company. My fellow fellows include Sir Tim Berners-Lee, Dr Sue Black, Stephen Hawking and Gareth Malone. Past fellows have included Charles Dickens, Benjamin Franklin and Karl Marx!

As a fellow, I shall have use of facilities at the RSA headquarters, off The Strand, pictured above. I shall henceforth refer to this, tongue firmly in cheek, as “my London club”.

My fellowship also means that I now have extra initials after my name. I’m “Andy Mabbett, FRSA”.

But you can still call me Andy.

Requesting open-licensed, open-format recordings of the voices of Wikipedia subjects for Wikimedia Commons

The Idea

A little while ago, my friend and fellow Wikipedia editor (he’s the Wikipedian in Residence at the British Library!) mentioned to me that Wikipedia could do with more sound files. We discussed recordings of music, industrial and everyday sounds (what does a printing press sound like? Or a Volkswagen Beetle? What do different kinds of breakfast cereal sound like when milk is added?), as well as people’s voices, so that we have a record of what they sound like.

A giant ear-trumpet

Beethoven’s Trumpet (With Ear) By John Baldessari, at the Saatchi Gallery.
Photo by Jim Linwood, on Flickr, CC-BY

In the spirit of Wikipedia, all such recordings would be open-licensed, to allow others to use them, freely. They can then be uploaded to Wikimedia Commons (the media repository for Wikipedia and its related projects) in an open format, namely Ogg Vorbis (that’s like mp3, but without patent encumbrances).

So I’m working on a new initiative to provide short (under ten-second) open-licensed audio clips of examples of the speaking voices of notable people (i.e. people who have Wikipedia articles about them).

What To Do

As a pilot, I’m asking some of my (cough) celebrity friends to kindly record the following, or a variation of their choice, with no background noise:

Hello, my name is [name]. I was born in [place] and I have been [job or position] since [year]

(but without mentioning Wikipedia!) They can do that, in quiet room, with a modern mobile phone, or a computer.

[Stop Press: See update 4, below, for update regarding use of “Vocaroo”, to avoid this step]

Once they’ve done that, they can convert the file to Ogg Vorbis using this free tool and then upload it to Wikimedia Commons, with an open-licence, with no “non-commercial (NC)” or “no derivatives (ND)” restrictions, (e.g. CC-By or CC-By-SA), and add the category “Voice intro project”.

If that’s too much fuss, they can e-mail it, or its URL, to me (andy@pigsonthewing.org.uk), using common file formats like mp3 or .wav, stating that it’s under one of those licences, and CC the mail to: permissions-en@wikimedia.org to formally record the open licence. Then I or other Wikipedia editors will make the conversion.

Alternatively, perhaps, they can point to a suitable, open-licensed, example of their speaking voice, which is already online.

Anyone Can Help

If you’re not the subject of a Wikipedia article, you can still help, by recording and uploading to Wikimedia Commons audio files, as described above, of machinery or everyday activities and occurrences.

Updates

  1. A couple of Wikipedia article subjects have asked why they would do this. In short, so that there is a public — and freely reusable — record of what they sound like, for current and future generations. And so that we know how they pronounce their names.
  2. The uploaded files are now gathered in a Wikimedia Commons category. Thank you to the early contributors.
  3. I’ve been asked about multi-lingual recordings. The best thing would be separate files, one in each language, please.
  4. If you have a microphone on your computer (doesn’t work on iPhone/iPad), it’s possible to record directly into the Vocaroo website, and just email or tweet me a link. But you still need to agree to an open licence!

How should a hackday be run?

I’m working with a large public-sector organisation who have a considerable — and potentially very useful — body of data. They’re keen to open it up, and would like to encourage people to use it by having a hack event of some kind. At the same time, it’s gratifying that they’re clear that they don’t wish to unfairly exploit anyone.

We’re considering a number of options, and would welcome comments and additional suggestions.

The event could be held in the Midlands; over one day or two, on weekdays, weekend, or Friday-Saturday. Or a competition could be announced online, with a virtual or real-life “dragons den” type event, for people to present things they’ve worked on at home.

Cray-2 super computer

You won’t need one of these to take part…
Computer Museum: Cray-2 by cmnit, on Flickr, CC-BY

Should we set a specific challenge, or just ask people to do something interesting with the data?

I’ve suggested prizes might be offered for both the most compete solution, and the best idea, whether compete or not. There might be prizes in other categories, such as the best idea by a young person or the most accessible product, or different categories for commercial and hobbyist entrants.

The data holders might also like to consider developing business relationships to the developers of one or more of the products, separate to any prize giving; rights in all the entries would of course remain with their developers, otherwise.

How would you like such an event to happen? We’re aware of the Hackday Manifesto, but what else is best practice, and what other pitfalls should be avoided?

Over to you…

Politician pin ups – open-licensed pictures, please

Politicians, like visits to the dentist and taxes, are a necessary evil. We all moan about them, but someone has to take care of the machinery of state.

So it’s important that we hold them to account, and elsewhere document their activities in a neutral way. Hyperlocal bloggers do the former, and the latter takes place on Wikipedia, and on sites like the excellent OpenlyLocal (both of whose content is open-licensed).

To illustrate such articles, bloggers and Wikipedians need photographs of the politicians (and senior officers). While it’s possible for individuals to take such pictures (and even open-license them, as I described previously), it would be better if such pictures were available from official channels. Such organisations already take or commission professional quality shots and make them available to the press. If they don’t already, they should make sure that their contract with photographers pays for full rights, enabling open-licensing.

I recently asked Birmingham City Council’s press office to make their pictures of members of BCC’s cabinet available under an open licence, and, to their credit, they did so. I was then able to use one of them on :

Wikipedia article using a picture open-licensed by Birmingham City Council

Some might ask “but what if the pictures are misused, to misrepresent those people”. Well, if someone’s going to do that, then they won’t bother about copyright anyway, and other laws (libel, human rights) already enable redress.

So come on all you councils, civil service departments, police forces/ authorities and so on — let us have pictures of your elected members and senior officers, free (i.e. with no “non-commercial” or “no derivatives” restrictions) for reuse on our blogs, Wikipedia and other sites. Major companies, too, could do this for their most-public board members.

Then there’s all public bodies’ other photographs. After all, West Midlands Police kindly agreed to my request to open-license the fantastic aerial shots from their helicopter…

St. Martin in the Bullring Church, Birmingham
Birmingham’s Bull Ring, from the West Midlands Police helicopter. Although this picture is ©WM Police, I can use it, here and on Wikipedia, because they kindly make it available under a CC-BY-SA licence

Bullet points from UK Govcamp 2012

I spent Friday and Saturday at UKGovCamp2012, a splendid unconference, in London, for people interested in the use of digital technologies in local and national government. Or “Glasto for Geeks” as it has famously been described. My friend and fellow attendee Dan Slee has suggested that we all blog a list of 20 thoughts we brought away from the event. I’m happy to oblige.

Steph Gray planning sessions at UKGovCamp 2012. Picture by David J Pearson; some rights reserved.

  1. Our national and London rail systems are overpriced, and the former’s ticketing is ridiculously over-complicated.
  2. It’s a good idea to walk (or cycle) through London, rather then getting the tube. You’ll see great architecture and public art, and get a better impression of how the various districts are laid out. But wear sensible shoes.
  3. Geeks do have great senses of humour. Especially those at our generous hosts and butt of jokes, Microsoft.
  4. There is still a lot of uncertainty about Open Data — what’s it for, what do we want, how should we use it. This is good, because — despite some valid concerns about the centralisation of innovation more generally — there is still room for us to innovate with Open Data.
  5. There are a lot of Brompton bikes in London. I’m determined to take mine on a future trip.
  6. We need better systems in place for using social media in responding to emergency situations. Expect some exciting news about a new project I and some fellow attendees are planning, soon.
  7. Anke Holst does not appear old enough to have a teenage child.
  8. When beta.gov.uk comes out of beta, and current .go.uk domains are “retired”, it’s really, really important that existing links to them, from external sites, still work. And by work, I mean go to relevant content, not a home page. As a very wise man once said, “Cool URIs don’t change“.
  9. It’s possible to spend one or two days at an event with good friends, and still fail to manage to say hello to them. Apologies if that’s you.
  10. Open Data and Freedom of Information are the two are opposite sides of the same coin. If an organisation has people responsible for Open Data and FoI and those people are not either the same, or closely linked, then that organisation has a problem.
  11. Terence Eden is not only (with his lovely wife Liz) a generous host, but also an impressively entertaining speaker. If his day job fails (it won’t) he has a viable alternative career in stand-up observational comedy. I went to his QR code session not only to learn, but to enjoy.
  12. If you ask them, people who share will kindly change their settings, so others can tag them.
  13. If you put three expert™ Wikipedia editors together in a room you will get at least four interpretations of the Conflict of Interest policy.
  14. Twitter still rocks. Its so ubiquitous (to us) that we forget that; and that some people still don’t get it.
  15. There are — contrary to popular perception — people working in Government who are keen to and do, make the images they produce available under open licences, so that others may reuse them. OpenAttribute may be useful to them.
  16. I want a Scottevest!
  17. People like having the #ukgc12 bookmarks curated on Pinboard.
  18. People recently turned, or thinking of becoming, freelance need more advice and help, and perhaps a support network.
  19. If our wonderful organisers Dave Briggs and Steph Gray are “the Lennon and McCartney of gov digital people”, who is going to be The Frog Chorus?
  20. Beer tastes even better when it’s free. Thank you, kind sponsors.

See you there next year!

Open-licensing your images. What it means and how to do it.

I do a lot of editing on Wikipedia. Sometimes I approach someone connected with a subject I’m writing about (or the subject themself), and ask them to provide an “open licensed” image. In other words, an image whose copyright they own, but given a licence which allows anyone to reuse it, even for commercial purposes.

With a few exceptions, only images made available under such licences can be used on Wikipedia.

Creative Commons

The commonest form of open licence is Creative Commons — a set of legalistic prose documents which cover various ways of licensing images.

Some Creative Commons include “non-commercial” (“NC”) clauses; these are incompatible with Wikipedia, because people are allowed to reuse content from Wikipedia in commercial situation, such as in newspapers or in apps which are sold for use on mobile devices (provided they comply with other licence terms). The same applies to “no derivatives” (“ND”) clauses, which mean that people cannot edit, crop, recolour or otherwise change your picture when reusing it.

The Creative Commons licences compatible with Wikipedia are:

  • Attribution Creative Commons (CC-BY)
  • Attribution-ShareAlike Creative Commons (CC-BY-SA)

In which:

  • “Attribution” means that the copyright holder must be given a credit
  • “ShareAlike” means that if someone uses your picture, anything made with it must have the same licence

It’s important that anyone open licencing an image understands what that means. For example, Wikimedia (the organisation behind Wikipedia) suggests that people donating images are asked to agree to the following:

  • I acknowledge that I grant anyone the right to use the work in a commercial product, and to modify it according to their needs, as long as they abide by the terms of the license and any other applicable laws.
  • I am aware that I always retain copyright of my work, and retain the right to be attributed in accordance with the license chosen. Modifications others make to the work will not be claimed to have been made by me.
  • I am aware that the free license only concerns copyright, and I reserve the option to take action against anyone who uses this work in a libelous way, or in violation of personality rights, trademark restrictions, etc.
  • I acknowledge that I cannot withdraw this agreement…

(and yes, that wording has a CC-BY-SA licence!)

Which is the best licence to use?

That depends on the circumstances, but CC-BY-SA fits most cases, giving the re-user the greatest flexibility, while protecting the copyright holder’s right to be recognised.

So, how do I open-licence an image?

There are a variety of ways to open-licence an image. Here are some of the commonest:

  • Upload your images to Wikimedia Commons, the media repository for Wikipedia and other Wikimedia projects
  • Upload your images to Flickr, specifying one of the above open licences
  • Upload your images to your own website, with a clear and unambiguous statement that they are under a specified open licence

My images are on Flickr, how do I change the licence?

To open-licence a single image in Flickr:

Selecting an open licence in Flickr's pop-up dialogue

  • View the specific image
  • Under “Owner settings”, alongside current licence setting (perhaps “All Rights Reserved”), click “edit”
  • In the pop-up window, check one of the compatible licences
  • Save

[Postscript: My friend John Cummings wrote an equivalent guide for YouTube]

Won’t I lose money doing this?

the ingliston gorilla

Not necessarily. Some commercial photographers release low- or medium- resolution copies of their images, and sell high-resolution copies, but most people take images for personal purposes, which have no commercial value, and for which they will never be paid. Open-licensing them enables the community to benefit, at no cost to the photographer. Think of open-licensing your images as a way of giving back to the community which has given you so many open-source tools, without which the web would not work.

If this post has inspired you to openly-licence your images please let me know, in the comments.

And yes you can use other people’s open-licenced images, including many of mine, free. Help yourself!

Caveat

Yes, I know there are other open licences, and more complex use-cases. This is intended as a beginners’ guide. A competent lawyer will be able to provide you with legal advice. I offer more general advice to institutions wanting to open-licence their images or other content, or to work with the Wikipedia community, as part of my professional services.

Licence

This post is available under a Creative Commons Attribution-ShareAlike (3.0 Unported) licence. Attribution should include a link to the post, or, in print, the short URL http://wp.me/p10xWg-jM.