Tag Archives: Wikidata

Developer needed to make Wikidata’s geographical data compatible with GPS tools

Wikidata needs an open-source developer to make its geographical query results compatible with GPS devices and other geo-spatial tools. Here’s why…

If you query Wikidata (the database sibling project of Wikipedia) for geographically locatable subjects (say, a list of accredited museums in the UK) the results are returned in a table.

When the data has coordinates, with a single click (on the left-hand menu, in desktop view) the results can also be displayed on a map.

The tabular data can be downloaded (via the right-hand menu) in a number of formats, such as CSV, HTML or JSON.

The Wikidata community would like users to be able also to export the data in one or more GPS-friendly formats. These are not only useful for GPS devices, but are compatible with other mapping and visualisation tools. I opened a ticket for this feature request—in 2019!

A patch to do this, supporting GPX, GeoJSON and KML, has been coded. However, it relies on a number of libraries, which in turn introduce numerous dependencies on other libraries. Because these libraries all need to be security-checked, and maintained, using the patch would be cost-prohibitive. As a result, it has been declined.

We are told that it should be possible to code the conversions directly, so that the libraries are not needed. Or to look at removing what we do not need from those libraries. This “requires a developer with a bit more understanding of the formats to look into it”.

I’m not a developer, and the nuts-and-bolts of this are mysterious to me.

We need someone with the relevant knowledge and experience, willing to work on an open-source fix, for the common good.

Who will step up and take on this pro-bono work?

HASLibCamp

I remember fondly my first unconference, UKLocalGovCamp, in Birmingham in 2009. It really was life-changing for me; the way it opened my eyes to the possibilities of disruptive innovation was the catalyst for me eventually becoming a freelancer. I made many long-standing friends, makers and activists inside and outside local government, there, too.

Since then I’ve attended many more, and have run or facilitated a good number of unconferences, and unconference-style sessions within more traditional types of events. If I say so myself, it’s something I’m good at, and I certainly enjoy it.

So I was pleased, last Saturday, to be able to spend the day in London to attend, and help to facilitate, HASLibCamp, an unconference for librarians in the field of Health and Science. That’s a relatively narrow focus, and so the event wasn’t as big as some I’ve been to, but in no way did that diminish the quality of the sessions, nor the participants’ enthusiasm.

After housekeeping and introductions on behalf of our host, the Department of Library & Information Science at London City University (CityLIS), I asked for a few shows-of-hands, and quickly determined that we had been joined by academic, commercial, hospital and public librarians, and archivists, as well as some student librarians.

I then explained how unconferences work, and invited any of them who wanted to, to give a thirty-second “pitch” for a session in which they wanted to participate, or a topic they wished to discuss.

Luckily, we had just the right number of pitches for the rooms and timeslots available, leaving two gaps after lunch, which were filled during the day with follow-up sessions. A “chill out” room was also avaialable, as was space for ad-hoc discussions and meetings.


whiteboard with list of sessions, in timetable format

The pitched sessions.

I pitched two sessions, one on ORCID identifiers (what they are, and how librarians can help to embed them in their organisations), and another – in response to a request received before the event – on Wikipedia, Wikidata and my work as a Wikimedian in Residence.

I also attended a session on what public libraries might learn about giving healthcare information, from academic libraries. Several resources were mentioned and are linked in the Storify reporting (see below). My final session was billed as being about understanding customer needs, and took the form of a lateral-thinking exercise.

Here’s a brief roundup of coverage elsewhere:

I’m grateful to the Consortium of Independent Health Libraries in London (CHILL) for sponsoring my attendance at the event (a condition of which was that this blog post be written).

Documenting public art, on Wikipedia

Wikipedia has a number of articles listing public artworks (statues, murals, etc) in counties, cities and towns, around the world. For example, in Birmingham. There’s also a list of the lists.

Gilded statue of three men

Boulton, Watt and Murdoch (1956) by William Bloye.
Image by Oosoom, CC BY-SA 3.0

There are, frankly, not enough of these articles; and few of those that do exist are anywhere near complete (the best is probably the list for Westminster).

How you can help

I invite you to collaborate with me, to make more lists, and to populate them.

You might have knowledge of your local artwork, or be able to visit your nearest library to make enquiries; or to take pictures (in the United Kingdom, of “permanent” works, for copyright reasons — for other countries, read up on local ‘Freedom of Panorama‘) and upload them to Wikimedia Commons, or even just find coordinates for items added by someone else. If you’re a hyperlocal blogger, or a journalist, perhaps you can appeal to your readership to assist?

Practical steps

You can enter details of an artwork using the “Public art row” family of templates. A blank entry looks like:


{{Public art row
| image =
| commonscat =
| subject =
| location =
| date =
| show_artist= yes
| artist =
| type =
| material =
| dimensions =
| designation =
| coordinates =
| owner =
| show_wikidata= yes
| wikidata =
| notes =
}}

(change “yes” to “no” if a particular column isn’t wanted) and you simply type in the information you have, like this:


{{Public art row
| image = Boulton, Watt and Murdoch.jpg
| commonscat = Statue of Boulton, Watt and Murdoch, Birmingham
| subject = ''[[Boulton, Watt and Murdoch]]''
| location = Near the House of Sport – Broad Street
| date = {{Start date|1956}}
| artist = [[William Bloye]]
| type = statue
| material = Gilded [[Bronze]]
| dimensions = 10 feet tall
| designation = Grade II listed
| coordinates = 52.478587,-1.908395
| owner = [[Birmingham City Council]]
| show_wikidata= yes
| wikidata = Q4949742
| notes = <ref>http://www.birminghammail.co.uk/whats-on/things-to-do/top-5-statues-birmingham-5678972</ref>
}}

Apart from the subject, all the values are optional.

In the above (as well as some invented values for illustrative purposes):

but if that’s too complicated, you can just enter text values, and someone else will come along and do the formatting (experienced Wikipedians can use the {{Coord}} template for coordinates, too). If you get stuck, drop me a line, or ask for help at Wikipedia’s Teahouse.

What this does

The “Public art row” template makes it easy to enter data, keeps everything tidy and consistently formatted, and makes the content machine-readable, That means that we can parse all the contents and enter them into Wikidata, creating new items if required, as we go.

We can then include other identifiers for the artworks in Wikidata, and include the artworks’ Wikidata identifiers in other systems such as OpenStreetMap, so everything becomes available as linked, open data for others to reuse and build new apps and tools with.

United Kingdom parliamentary URL structure: change needed

In Wikidata, Wikipedia’s sister project for storing statements of fact as , we record a number of unique identifiers.

For example, Tim Berners-Lee has the identifier “85312226” and is known to the as “nm3805083”.

We know that we can convert these to URLs by adding a prefix, so

by adding the prefixes:

  • https://viaf.org/viaf/
  • http://www.imdb.com/Name?

respectively. We only need to store those prefixes in Wikidata once each.


HOUSES OF PARLIAMENT DSC 7057 pano 2

The in August 2014,
picture by Henry Kellner, CC BY-SA 3.0

The United Kingdom Parliament website also uses identifiers for MPs and members of the House of Lords.

For example, Tom Watson, an MP, is “1463”, and Jim Knight, aka The Lord Knight of Weymouth, is “4160”.

However, the respective URLs are:

meaning that the prefixes are not consistent, and require you to know the name or exact title.

Yet more ridiculous is that, if Tom Watson ever gets appointed to the House of Lords, even though his unique ID won’t change, the URL required to find his biography on the parliamentary website will change — and, because we don’t know whether he would be, say Lord Watson of Sandwell Valley, or Lord Watson of West Bromwich, we can’t predict what it will be.

When building databases, like Wikidata, this is all extremely unhelpful.

What we would like the parliamentary authorities to do — and what would benefit others wanting to make use of parliamentary URLs — is to use a standard, predictable type of URL, for example http://www.parliament.uk/biographies/1463 which uses the unique identifier, but does not require the individual’s house, name or title, and does not change if they shift to “the other place”.

If necessary they could then make that redirect to the longer URLs they prefer (though I wouldn’t recommend it).

I’ve asked them; but they don’t currently do this. In fact they explained their preference for the longer URLs thus:

…we are unable [sic] to shorten the url any further as the purpose of the current pattern of the web address is to display a pathway to the page.

The url also identifies the page i.e the indication of biographies including the name of the respective Member as to make it informative for online users who may view the page.

I find these arguments unconvincing, to say the least.


Screenshot, with Watson's name in the largest font on the page

There’s a big enough clue on the page, without needing to read the URL to identify its subject

Furthermore, the most verbose parts of the URLs are non-functioning; if we truncate Tom’s URL by simply dropping the final digit: http://www.parliament.uk/biographies/commons/tom-watson/146, then we get the biography of a different MP. On the other hand, if we change it to, say: http://www.parliament.uk/biographies/commons/t/1463, we still get Tom’s page. Try them for yourself.

So, how can we help the people running the Parliamentary website to change their minds, and to use a more helpful URL structure? Who do we need to persuade?

HLF licensing requirement considered harmful

This morning I attended a very interesting presentation on the availability, in the United Kingdom, of grant funding from the (HLF), for digital heritage projects. I’ve previously worked as Wikipedian in Residence or as a Wikipedia consultant on HLF-funded projects*, helping to disseminate knowledge and content generated by those projects via Wikipedia, via Wikimedia Commons and via Wikidata.

Badge reads 'Birmingham Socialist A.R.P. Canteen fund' and has a drawing of ARP wardens being served at a mobile canteen

This fantastic image of a World War II badge was taken by Sasha Taylor during a Wikipedia editathon I ran as part of my HLF-funded residency at Thinktank, Birmingham Science Museum. Because Sasha was a volunteer, he’s not bound by HLF rules, so was able to use a CC BY-SA licence, and I was then able to add the image to Wikipedia articles. With an NC restriction, I couldn’t have done so.

As part of the presentation, it was proudly pointed out that the HLF’s current terms of funding include:

All digital outputs must be… licensed for use by others under the Creative Commons licence ‘Attribution Non-commercial’ (CC BY-NC) for the life of your contract with HLF, unless we have agreed otherwise

However, I’m really irked by this. I’ve written previously about what this means and why Wikipedia and its sister projects require content to be under a less restrictive licence, allowing for commercial reuse (briefly: people are allowed to reuse content from Wikipedia in commercial situations, for example in newspapers, or in apps which are sold for use on mobile devices). Others — have — written about why the NC restriction can be harmful.

Of course, mechanical copies of out-of-copyright works should be marked as such, and no attempt to claim copyright over them should be made.

In response to my question, it was confirmed that the terms prohibit less-restrictive licences, even if those doing the work wish to use them.

[Admittedly there is a work-around, which is to dual licence as both CC BY-NC and a less-restrictive version; which technically meets the letter of HLF’s requirement, but is actually nonsensical.]

I can see no earthly reason why HLF would insist on prohibiting a less restrictive licence, if the bodies they are funding choose to use one. If I’ve missed something, I’d be grateful for an explanation.

The phrase “for the life of your contract with HLF” is also nonsensical, since such licences are both indefinite and irrevocable.

I would like to see the above wording changed, to something like:

All digital outputs must be… licensed for use by others under the Creative Commons licence ‘Attribution Non-commercial’ (CC BY-NC) or a less restrictive licence (e.g. CC BY, CC BY-SA, or CC0), unless we have agreed otherwise

Better sill, HLF could mandate an open licence, unless agreed otherwise.

How about it, HLF?

* If you’re bidding for HLF funding and would like advice about including a Wikipedia component, please drop me a line#.

# That might lead to someone paying me. Some would argue that that means I can’t use a NC-restricted image on this page.

Matching ORCID and other authority control identifiers in Wikidata BEACON

Further to my previous post on finding ORCID identifiers used in Wikidata & Wikipedia, Magnus Manske has released another useful gadget. “Wikidata BEACON” is a new tool that matches individuals’ (or other subjects’) entries in two different authority control systems. One of these, of course, can be ORCID.

For example to find people who are listed in Wikidata, and have an ORCID identifier recorded there, and who also have, say, a VIAF identifier, or a MusicBrainz artist profile, choose one of those properties, then the other, from the two drop down menus, then select “Get BEACON data”.

screenshot

Screenshot of Beacon, with ORCID and VIAF identifiers selected.

The result is returned as a pipe (“|“)-separated list, with the middle of the three columns being the Wikidata ID (in the format “Qnnn“) of the item concerned. (For the technically inclined, the format is BEACON, used to enable third party data re-users to automate the conversion of identifier values into web links. You can see the part-URLs, to which the values must be appended, at the head of the results page, labelled #PREFIX and #TARGET)

So, Bill Thompson, for instance, appears as:

4426461|Q4911143|0000-0003-4402-5296

showing respectively, his VIAF (4426461), Wikidata (Q4911143), and ORCID (0000-0003-4402-5296) identifiers

A query can also be made in the form of a URL, for example this one:

https://tools.wmflabs.org/wikidata-todo/beacon.php?prop=496&source=214

in which “496” is from Wikidata’s code for an ORCID identifier and “214” for a VIAF identifier.

Another example is:

https://tools.wmflabs.org/wikidata-todo/beacon.php?prop=661&source=373

which shows the identifiers of chemicals in the Royal Society of Chemistry’s ChemSpider database and the matching Wikimedia Commons categories.

Similarly:

https://tools.wmflabs.org/wikidata-todo/beacon.php?prop=827&source=345

matches the BBC and Internet Movie Database (IMDb) identifiers of television programmes.

Beacon is a good illustration of the way in which Wikidata has become a hub linking disparate datasets about people, and other things; as described by Andrew Gray in “Wikidata identifiers and the ODNB – where next?“.

Day 8 (Wednesday) in Washington DC

I started yesterday by joining a guided walk around . The hostel I’m staying at offers regular tours to various venues and areas, led by local volunteers. A short Metro hop took us under the Potomac River and into Virginia, the state from which the cemetery overlooks Washington. In fact, from the top of the hill there, you can see three states, the third being Maryland.

Arlington is as sombre and as impressive as you would imagine, and impeccably maintained. As well as its famous military burials, dating back to the civil war, it has the graves of John F Kennedy and wife Jackie, and his brother Edward. There’s also a monument to those lost in the Lockerbie bombing and a tomb for unknown soldiers from various conflicts. Before leaving, I was lucky enough to see a very smart, male (Spizella passerina).

My next call was George Washington University, venue for the Wikimania conference. After a great lunch and yet more catching up with friends, I attended a “Wiki Loves Libraries” event, to which librarians from other institutions had been invited. I gave a lightning talk on QRpedia, and had some useful discussions about Authority Control and Wikidata.

Then it was time to turn to the hostel, which I did by bike, to freshen and smarten up, before travelling to the Library of Congress for the formal reception event kindly sponsored by Google. Fine food and free beer certainly helped the mood, and I caught up with more people I’d met in Amsterdam, and met the official Archivist of the United States.

I particularly enjoyed viewing a display of significant American books including The Legend of Sleepy Hollow which was written in my home town, Birmingham, England!

The view of the sunset behind the Capitol building was breathtaking. A courtesy bus took us to Dupont Circle, where a group of Brits and one German-Namibian found a bar for more refreshments (I enjoyed the best beer I’ve had here so far, an oatmeal porter), and put the World, or at least Wikipedia, to rights. There are some very interesting issues for Wikipedians in Namibia, where internet connectivity is patchy, and where source material is often not readily available. Work is underway to provide a self-contained, offline version of Wikipedia for schools there. We also enjoyed some pretty loud rock music, including the sublime ‘Freebird‘. The 18-year-old me who bought that on a 12″ single, would never have dreamed he’d one day listen to it in an American dive bar.