Category Archives: ideas

My Open Data Challenge to UK Local Government: a Wikipedia Page for Every Council

At yesterday’s excellent West Midlands “Open Data: Challenges & Opportunities” event, hosted by the West Midlands Regional Observatory, Chris Taggart (), who runs the very useful Openly Local website, aggregating data about councils and their elected members, mentioned the problems he has extracting linked data about councils from Wikipedia, via DBPedia, because Wikipedia tends to conflate places with their local authorities.

See, for example, the Wikipedia article on the Metropolitan Borough of Dudley; or those on , which (at the time of writing) has only a small section on its town council and Lichfield district (so a challenge there for Stuart Harrison, , and his colleagues!); and compare them with the separate articles about and ; or , the , and . The former, all-on-one-page, pattern is far more common. (Disclosure: I created some, and have edited all, of those articles.)

I suggested at the event that this problem could be solved if staff from each UK council simply started a Wikipedia article about their council, where none already exists.

As each UK council is, inherently, (to use the Wikipedia jargon) notable, there should be no issue with this, provided that they are mindful of Wikipedia’s policy on conflicts of interest (which explicitly allows for such editing), and the requirement that articles maintain a neutral point-of-view, and be referenced. Short “stub” articles can be created in the first instance.

(If council staff are hesitant to do so themselves, then I can help to pair them up with volunteer Wikipedia editors who will assist them, or create articles directly.)

Update: Added Dudley & Lichfield district examples.

Lists in Microformats: Suggested Optimisation

Based on my extensive experience of applying microformats to templates in Wikipedia (and other MediaWiki installations) I’ve come to the following conclusion…

For attributes which can occur more than once (such as nickname or category in hCard), lists having, or in container having, that property should be parsed as lists of individual instances of that property.

For example:

<div class="category">
<ul>
<li>ornithologist</li>
<li>driver</li>
<li>gardener</li>
</ul>
</div>

and:

<ul class="category">
<li>ornithologist</li>
<li>driver</li>
<li>gardener</li>
</ul>

should be treated as equivalent to:

<ul>
<li class="category">ornithologist</li>
<li class="category">driver</li>
<li class="category">gardener</li>
</ul>

Twitter: A microformat in lieu of a protocol

In May of this year I wrote about the problems of URLs for a given Twitter user’s profile, or for an individual post or “status” being different, depending the Twitter client in use. I suggested a new protocol for Twitter links. [You might want to read that, before the rest of this post]. I can’t believe I didn’t think of this simpler solution sooner!

The answer (in the short term) is to use a microformat (or a microformat-like “poshsformat”, if you prefer to call it that) for each case. Let’s say we use the classes twitter-user & twitter-status.

User-agents (that’s jargon for browsers) could then employ a script (such as those used by GreaseMonkey, or a Firefox extension) to ignore the encoded URL and substitute the equivalent for the user’s preferred Twitter client instead.

For links to user profiles:

<a
href="http://twitter.com/pigsonthewing">
Andy Mabbett
</a>

would become:

<a
class="twitter-user"
href= "http://twitter.com/pigsonthewing">
Andy Mabbett
</a>

and:

<a
href="http://accessibletwitter.com/app/user.php?uid=pigsonthewing">
Andy Mabbett</a>

would become:

<a
class="twitter-user"
href=" http://accessibletwitter.com/app/user.php?uid=pigsonthewing">
Andy Mabbett</a>

Likewise, for individual statuses:

<a
href="twitter.com/pigsonthewing/status/1828036334">
something witty</a>

would become:

<a
class="twitter-status"
href="twitter.com/pigsonthewing/status/1828036334">
something wittyg<a>

and:

<a
href="accessibletwitter.com/app/status.php?1828036334">
something witty<a>

would become:

<a
class="twitter-status"
href="accessibletwitter.com/app/status.php?1828036334">
something witty<a>

and:

<a
href="m.slandr.net/single.php?id=1828036334"
something witty</a>

would become:

<a
class="twitter-status"
href="m.slandr.net/single.php?id=1828036334">
something witty</a>

To simplify matters, the rules for extracting the user ID or the status update could be the same in both cases:

  1. Parse the value of the href attribute of the element to which the class applies.
  2. If there is a question mark, use everything after that.
  3. Otherwise, if there is an equals sign, use everything after that.
  4. Otherwise, use everything after the last slash.

That would deal with all the examples in my earlier post.

So, if you’re using a user-agent which is aware of this microformat, and find on a page:

<a
class="twitter-user"
href="http://twitter.com/pigsonthewing">
Andy Mabbett<a>
said
<a
class="twitter-status"
href="m.slandr.net/single.php?id=1828036334">
something witty<a>

but your preferred Twitter client is Dabr (one I recommend, BTW!) then your browser would treat (and possibly render) that as:

<a
href="dabr.co.uk/user/pigsonthewing">
Andy Mabbett<a>
said
<a
class="twitter-status"
href="dabr.co.uk/status/1828036334">
something witty<a>

Simples!

Triple-tag references to Twitter posts

Further to my post about a protocol for Twitter posts, you can also triple-tag blog posts, Flickr images and similar web utterances, which refer to a specific twitter post (or status) like this: twitter:status=1975532392 – and this post is tagged with that!

[Update: See also my Flickr screenshot of a Twitter post, triple tagged with #twitter:status=1828036334 to reference the same post.]

How microformat developments are blocked

The hCard microformat can distinguish between a person and an organisation, by the use of the org property:


<div class="vcard">
<span class="fn">Andy Mabbett</span>
</div>


<div class="vcard">
<span class="fn org">The Red Cross</span>
</div>

but it cannot distinguish between an organisation and a place:


<div class="vcard">
<span class="fn org">The Wembley Stadium fan club</span>
</div>


<div class="vcard">
<span class="fn org">Wembley Stadium</span>
</div>

treating them both as organisations.

On 31 December 2007, I described a way in which hCard microformat could be used to differentiate between hCards for places and organisations.

On 9 January 2008, having received favourable comment, I made a formal proposal to update the hCard specification.

Despite this ten-day gap, Brian Suda, one of the microformats “admins”, the cabal who control microformats, complained that he’d only had two days to consider the matter, and that “More time is needed to fully look over the implications of this change.”

No objections to the method, nor issues with it, have been raised.

Toby Inkster’s superb microformats parser Swignition (formerly called “Cognition”) has supported the method since version 0.1-alpha8, released in May 2008.

One year on from my formal proposal, what changes have been made to the hCard specification, in this regard? None.

Update: Three years on from my formal proposal, what changes have been made to the hCard specification, in this regard? None.

Marking up the scientific names of living things

As any web manager worth their salt knows, it’s <span lang=”fr”>trés important</span> that changes in language be marked up with HTML’s “lang” attribute, using an IETF language tag (such as “fr” for French, as shown above). This allows software like text readers for blind people to pronounce them correctly (instead of sounding like an outtake from ‘Allo ‘Allo!) and means that translation software can handle them appropriately.

But what happens when a page like this one includes the scientific (or taxonomic) name of a living thing, such as Circus cyaneus (the Hen Harrier)? It’s not English, and should not be translated, into, say, German, as Zirkus cyaneus.

It’s not really Latin, either, though some people mistakenly refer to scientific names as “Latin names”. Many of them are neologisms — new words, with no real Latin content, but based on Latinised Greek (for example Brachypelma albopilosum), people’s names (Ardeola grayii, in honour of John Edward Gray, a biologist), place names (Nepenthes sumatrana, from Sumatra), culture (Ba humbugi, a quote from Charles Dickens‘ ‘A Christmas Carol‘) or even humour (Phthiria relativitae, a play on “The Theory of Relativity”).

Back in 2003, on the IETF mailing list whcih discusses such langauge codes, I proposed that there should be a specific language code, or sub-code, so that scientific names such as these could be marked up and recognised by software. There wasn’t much interest (possibly because I made the proposal as an amateur, rather than a professional or academic taxonomist), and distractions in my work and domestic life meant that I didn’t, unfortunately, have time to pursue the matter.

However, the need for such a code has now been recognised by Gregor Hagedorn, of the Julius Kuehn Institute, Germany‘s Federal Research Centre for Cultivated Plants, in Berlin, who has rekindled my proposal.

With the support of Gregor and other taxonomists, via the Taxacom mailing list, I’m hopeful we can at last make a case that such a code is needed.

Facebook should allow groups to be rationalised

I’ve just had a look on Facebook, for a group for people concerned about the nasty Phorm cyber-spying system. I found these:

  • Save UK internet privacy – reject ISPs that use Phorm (1,347 members)
  • Deny Phorm (48 members)
  • Arrest Ben Verwaayen for criminal offences under RIPA with regards to Phorm (26 members)
  • Fight back against PHORM (19 members)
  • Bad Phorm! (9 members)
  • Got Phorm? (7 members)
  • Stop ISP’s from breaching customers privacy!!!! (174 members)
  • Stop BT, TalkTalk, VirginMedia From Selling Your Web Browsing Information! (38 members)
  • Things you need to know about your Virgin Media/Blueyonder/NTL Broadband (21 members)

The situation is the same, or worse, for other subjects, too.

Firstly, I wonder what it is about people, that they set up a new group, rather than searching for, and joining, an existing one?

But, more importantly, Facebook needs some sort of mechanism to encourage, and then facilitate (with the agreement of their members) such groups to merge.

Suggested method of publishing microformats in Twitter posts

Twitter posts like this one:

We’re still deep in the Sundarbans, near Tambulbunia, meeting experts on dolphins and tigers. l:Tambulbunia, Bangladesh=22.27722,89.71905

have a place- name and corresponding coordinates (indicated by the prefix “l:”). This has allowed them to be plotted on a map.

It should be possible for the poster to send, say:

We’re still deep in the Sundarbans, near Tambulbunia, meeting experts on dolphins and tigers. #hcard: fn+locality:Tambulbunia: country-name:Bangladesh: geo:22.27722,89.71905

using colons as delimiters and have Twitter render that comment marked up as an hCard.

In the short term, this could be achieved by a third-party site, like #hashtags .

UPDATE:  being more mindful of the 140 character limit than I have in the above example, perhaps class names might be abbreviated (“loc” for “locality”, “ctry” for “country-name”, and so on).