In Wikidata, Wikipedia’s sister project for storing statements of fact as linked, open data, we record a number of unique identifiers.
For example, Tim Berners-Lee has the VIAF identifier “85312226” and is known to the Internet Movie Database as “nm3805083”.
We know that we can convert these to URLs by adding a prefix, so
- 85312226 becomes https://viaf.org/viaf/85312226/
- nm3805083 becomes http://www.imdb.com/Name?nm3805083
by adding the prefixes:
- https://viaf.org/viaf/
- http://www.imdb.com/Name?
respectively. We only need to store those prefixes in Wikidata once each.
The Houses of Parliament in August 2014,
picture by Henry Kellner, CC BY-SA 3.0
The United Kingdom Parliament website also uses identifiers for MPs and members of the House of Lords.
For example, Tom Watson, an MP, is “1463”, and Jim Knight, aka The Lord Knight of Weymouth, is “4160”.
However, the respective URLs are:
- http://www.parliament.uk/biographies/commons/tom-watson/1463
- http://www.parliament.uk/biographies/lords/lord-knight-of-weymouth/4160
meaning that the prefixes are not consistent, and require you to know the name or exact title.
Yet more ridiculous is that, if Tom Watson ever gets appointed to the House of Lords, even though his unique ID won’t change, the URL required to find his biography on the parliamentary website will change — and, because we don’t know whether he would be, say Lord Watson of Sandwell Valley, or Lord Watson of West Bromwich, we can’t predict what it will be.
When building databases, like Wikidata, this is all extremely unhelpful.
What we would like the parliamentary authorities to do — and what would benefit others wanting to make use of parliamentary URLs — is to use a standard, predictable type of URL, for example http://www.parliament.uk/biographies/1463 which uses the unique identifier, but does not require the individual’s house, name or title, and does not change if they shift to “the other place”.
If necessary they could then make that redirect to the longer URLs they prefer (though I wouldn’t recommend it).
I’ve asked them; but they don’t currently do this. In fact they explained their preference for the longer URLs thus:
…we are unable [sic] to shorten the url any further as the purpose of the current pattern of the web address is to display a pathway to the page.
The url also identifies the page i.e the indication of biographies including the name of the respective Member as to make it informative for online users who may view the page.
I find these arguments unconvincing, to say the least.
There’s a big enough clue on the page, without needing to read the URL to identify its subject
Furthermore, the most verbose parts of the URLs are non-functioning; if we truncate Tom’s URL by simply dropping the final digit: http://www.parliament.uk/biographies/commons/tom-watson/146, then we get the biography of a different MP. On the other hand, if we change it to, say: http://www.parliament.uk/biographies/commons/t/1463, we still get Tom’s page. Try them for yourself.
So, how can we help the people running the Parliamentary website to change their minds, and to use a more helpful URL structure? Who do we need to persuade?
I had a play with those URLs and discovered that it doesn’t pay complete attention to the slugs for house and name. If you get the house (i.e. ‘lords’ or ‘commons’) wrong, it’ll redirect to correct the whole URL. If you get just the name wrong, it’ll give you the right page, but not update the URL.
So, http://www.parliament.uk/biographies/commons/rupert-murdoch/1463 will show you the page for Tom Watson, but not correct the URL. http://www.parliament.uk/biographies/lords/rupert-murdoch/1463 will redirect to the canonical URL and show you the right page.
A bit inconsistent, but at least you know your links will always work (even if the URLs you provide are misleading). I suppose you could write a bot that checks for them in both houses to find the canonical URL and so update Wikidata with it when it changes.
I also don’t see how it could be beyond the wit of whoever looks after it to extend it to support e.g. http://www.parliament.uk/biographies/1463, as you suggested.
They could flip the pattern, such that the human-facing URL becomes:
http://www.parliament.uk/biographies/1463/commons/tom-watson
The routing code (be it regexps and rewrites, or something more comprehensive) could then detect if the path component immediately following /biographies/ was numeric and locate the biography entry based upon that ID if so. For bonus points, return a 301 if the remaining path components don’t match the canonical form.
If the path component following /biographies/ is not numeric, then they could either fall through to the existing parsing logic, or (if the canonical form is modified to the above), extract the ID and redirect.
The outcome is that you can link to
http://www.parliament.uk/biographies/1463
…with impunity, old links don’t break, and the NFRs defined by parliament.uk remain satisfied.
Take a look at http://myparliament.info. This has the pattern for biogs that I believe you’re after.
e.g.
http://myparliament.info/Member/172
http://myparliament.info/Member/1467
and so on
Thank you, Matt. I wonder why Parliament’s (anonymous) webmaster didn’t mention that, when they replied to my email query (i.e. in the reply which I quote in part, above)?
Alas, the new pages, good though they are, do not appear to have the full info found in the earlier ones (such as Tom Watson’s election in 2001, or his interest in Japan), nor to link to them. Are they intended to be an eventual replacement, or will they continue to exist in parallel?
They’re a parallel thing, a driver to highlight how our data can be used in other ways.
http://myparliament.info/Member/1463 – the biog tab mentions his interest in Japan, its all in plain English, so you need to scan through it to see.
Will add his prior elections/contested in the next day or two.
A basic rule of persistent URI minting is not to include any names – they change – so these parliamentary URLs are clearly sub-optimal. The fact that there clearly *is* some logic being used to do *some* redirection suggests that it wouldn’t be hard to fix. And, as others have said, your suggestion of http://www.parliament.uk/biographies/1463 is the best one. Maybe it’s to do with the fact that we’re talking here about biographies? I think I’d rather see URIs that identify the person like http://www.parliament.uk/id/1463 – ORCID style- and then that would return info about where to find the biography which could be ephemeral.
My musings on URI persistence have been around for a while but still seem relevant.
Good point Phil – the current URLs will need to change (or will become embarrassingly stale) if an MP changes name on, say, marriage, or divorce, or for any other reason.
Hello again
Just wanted to give an early glimpse of some of the URI patterns we’re planning to use on the new Parliament website. Basic pattern is /type-of-thing/:thing.id
Where a thing ID is an opaque 8 character identifier. So:
https://beta.parliament.uk/people/7TX8ySd4
is Diane Abbott
https://beta.parliament.uk/houses/KL2k1BGP
is the House of Commons
https://beta.parliament.uk/parties/Dit41UlB
is the Green Party
https://beta.parliament.uk/constituencies/KO6xjpMd
is the current Runnymede and Weybridge constituency
etc
The identifiers are not guaranteed to be remain stable until we come out of beta but hopefully they give a decent indication of direction of travel
Current URL patterns are listed here:
https://github.com/ukparliament/ontologies/blob/master/urls.csv
All baby steps and lots more to do but a start
Michael: thank you for the update. It’s good to see some movement on this.
I particularly like your “way to look up a person from a foreign identifier eg /lookup?source=wikidata&id=Q574980” (in the Github document).
I look forward to hearing this has gone live – good luck with the roll-out!