United Kingdom parliamentary URL structure: change needed

In Wikidata, Wikipedia’s sister project for storing statements of fact as , we record a number of unique identifiers.

For example, Tim Berners-Lee has the identifier “85312226” and is known to the as “nm3805083”.

We know that we can convert these to URLs by adding a prefix, so

by adding the prefixes:

  • https://viaf.org/viaf/
  • http://www.imdb.com/Name?

respectively. We only need to store those prefixes in Wikidata once each.


HOUSES OF PARLIAMENT DSC 7057 pano 2

The in August 2014,
picture by Henry Kellner, CC BY-SA 3.0

The United Kingdom Parliament website also uses identifiers for MPs and members of the House of Lords.

For example, Tom Watson, an MP, is “1463”, and Jim Knight, aka The Lord Knight of Weymouth, is “4160”.

However, the respective URLs are:

meaning that the prefixes are not consistent, and require you to know the name or exact title.

Yet more ridiculous is that, if Tom Watson ever gets appointed to the House of Lords, even though his unique ID won’t change, the URL required to find his biography on the parliamentary website will change — and, because we don’t know whether he would be, say Lord Watson of Sandwell Valley, or Lord Watson of West Bromwich, we can’t predict what it will be.

When building databases, like Wikidata, this is all extremely unhelpful.

What we would like the parliamentary authorities to do — and what would benefit others wanting to make use of parliamentary URLs — is to use a standard, predictable type of URL, for example http://www.parliament.uk/biographies/1463 which uses the unique identifier, but does not require the individual’s house, name or title, and does not change if they shift to “the other place”.

If necessary they could then make that redirect to the longer URLs they prefer (though I wouldn’t recommend it).

I’ve asked them; but they don’t currently do this. In fact they explained their preference for the longer URLs thus:

…we are unable [sic] to shorten the url any further as the purpose of the current pattern of the web address is to display a pathway to the page.

The url also identifies the page i.e the indication of biographies including the name of the respective Member as to make it informative for online users who may view the page.

I find these arguments unconvincing, to say the least.


Screenshot, with Watson's name in the largest font on the page

There’s a big enough clue on the page, without needing to read the URL to identify its subject

Furthermore, the most verbose parts of the URLs are non-functioning; if we truncate Tom’s URL by simply dropping the final digit: http://www.parliament.uk/biographies/commons/tom-watson/146, then we get the biography of a different MP. On the other hand, if we change it to, say: http://www.parliament.uk/biographies/commons/t/1463, we still get Tom’s page. Try them for yourself.

So, how can we help the people running the Parliamentary website to change their minds, and to use a more helpful URL structure? Who do we need to persuade?

9 thoughts on “United Kingdom parliamentary URL structure: change needed

  1. Alexander Dutton

    I had a play with those URLs and discovered that it doesn’t pay complete attention to the slugs for house and name. If you get the house (i.e. ‘lords’ or ‘commons’) wrong, it’ll redirect to correct the whole URL. If you get just the name wrong, it’ll give you the right page, but not update the URL.

    So, http://www.parliament.uk/biographies/commons/rupert-murdoch/1463 will show you the page for Tom Watson, but not correct the URL. http://www.parliament.uk/biographies/lords/rupert-murdoch/1463 will redirect to the canonical URL and show you the right page.

    A bit inconsistent, but at least you know your links will always work (even if the URLs you provide are misleading). I suppose you could write a bot that checks for them in both houses to find the canonical URL and so update Wikidata with it when it changes.

    I also don’t see how it could be beyond the wit of whoever looks after it to extend it to support e.g. http://www.parliament.uk/biographies/1463, as you suggested.

    Reply
  2. Mo

    They could flip the pattern, such that the human-facing URL becomes:

    http://www.parliament.uk/biographies/1463/commons/tom-watson

    The routing code (be it regexps and rewrites, or something more comprehensive) could then detect if the path component immediately following /biographies/ was numeric and locate the biography entry based upon that ID if so. For bonus points, return a 301 if the remaining path components don’t match the canonical form.

    If the path component following /biographies/ is not numeric, then they could either fall through to the existing parsing logic, or (if the canonical form is modified to the above), extract the ID and redirect.

    The outcome is that you can link to

    http://www.parliament.uk/biographies/1463

    …with impunity, old links don’t break, and the NFRs defined by parliament.uk remain satisfied.

    Reply
    1. Andy Mabbett Post author

      Thank you, Matt. I wonder why Parliament’s (anonymous) webmaster didn’t mention that, when they replied to my email query (i.e. in the reply which I quote in part, above)?

      Alas, the new pages, good though they are, do not appear to have the full info found in the earlier ones (such as Tom Watson’s election in 2001, or his interest in Japan), nor to link to them. Are they intended to be an eventual replacement, or will they continue to exist in parallel?

      Reply
      1. Matt

        They’re a parallel thing, a driver to highlight how our data can be used in other ways.

        http://myparliament.info/Member/1463 – the biog tab mentions his interest in Japan, its all in plain English, so you need to scan through it to see.

        Will add his prior elections/contested in the next day or two.

        Reply
  3. Phil Archer

    A basic rule of persistent URI minting is not to include any names – they change – so these parliamentary URLs are clearly sub-optimal. The fact that there clearly *is* some logic being used to do *some* redirection suggests that it wouldn’t be hard to fix. And, as others have said, your suggestion of http://www.parliament.uk/biographies/1463 is the best one. Maybe it’s to do with the fact that we’re talking here about biographies? I think I’d rather see URIs that identify the person like http://www.parliament.uk/id/1463 – ORCID style- and then that would return info about where to find the biography which could be ephemeral.

    My musings on URI persistence have been around for a while but still seem relevant.

    Reply
  4. Michael Smethurst

    Hello again

    Just wanted to give an early glimpse of some of the URI patterns we’re planning to use on the new Parliament website. Basic pattern is /type-of-thing/:thing.id

    Where a thing ID is an opaque 8 character identifier. So:
    https://beta.parliament.uk/people/7TX8ySd4
    is Diane Abbott

    https://beta.parliament.uk/houses/KL2k1BGP
    is the House of Commons

    https://beta.parliament.uk/parties/Dit41UlB
    is the Green Party

    https://beta.parliament.uk/constituencies/KO6xjpMd
    is the current Runnymede and Weybridge constituency
    etc

    The identifiers are not guaranteed to be remain stable until we come out of beta but hopefully they give a decent indication of direction of travel

    Current URL patterns are listed here:
    https://github.com/ukparliament/ontologies/blob/master/urls.csv

    All baby steps and lots more to do but a start

    Reply
  5. Andy Mabbett Post author

    Michael: thank you for the update. It’s good to see some movement on this.

    I particularly like your “way to look up a person from a foreign identifier eg /lookup?source=wikidata&id=Q574980” (in the Github document).

    I look forward to hearing this has gone live – good luck with the roll-out!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


e.g. 0000-0002-7299-680X