Tag Archives: microformat

The BBC, Open Content and Wikipedia

I had a really interesting meeting with Robin Morley, the BBC‘s Social media lead for the English Regions, a couple of weeks ago. After he gave me a very interesting tour of their premises in Birmingham’s Mailbox (where, in its former guise as Royal Mail’s Birmingham head office, my father Trevor had an office), he described to me the work he does.

We then discussed how his London colleagues insert automatically content from Wikipedia, into the BBC website’s pages on wildlife (example: Barn Owl), and on music (example, of course, ). I contributed to the former by writing markup to make them emit the ‘species’ microformat, of which I’m also the author.

Screen capture of BBC article on Pink Floyd, linked to in post

BBC article on Pink Floyd, including Wikipedia content (links to original article)

They are able to do this because all of Wikipedia’s content is available under a . In other words, anyone can reuse it, for free.

I suggested to Robin that his news staff could similarly reuse Wikipedia content. For example, the article “Birmingham Assay Office silver name plaque stolen“:

screen shot of BBC article linked to from this post

BBC Birmingham & Black Country article on a theft from Birmingham Assay Office (links to original article)

could use text from Wikipedia in a pullout (a sub-section, or box at the side of the article) which might say:

The Birmingham Assay Office is one of the four remaining assay offices in the United Kingdom.

It opened on 31 August 1773 and initially operated from three rooms in the King’s Head Inn on New Street employing only four staff and was only operating on a Tuesday. The first customer on that day was Matthew Boulton. The hallmark of the Birmingham Assay Office is the Anchor.

Services provided by the office include nickel testing, metal analysis, plating thickness determination, bullion certification, consultancy and gem certification.

Text in this section copyright Wikipedia authors, licenced

All that would be required would be for credit to Wikipedia to be given, and the pullout text (but not the whole BBC article) to be made available under the same open licence, as above.

This could be done on articles about all sorts of topics: people, places, organisations, events and more, as well as sports reports.

Robin seemed to like the idea, so I’m looking forward to seeing how he and his colleagues make use of Wikipedia content.

Update: Another post, “The BBC, Regional News and Sport, and Hyperlocal Blogs” about something else we discussed at our our meeting, is now published.

Fixing Facebook’s Microformats (at their request)

Twitter, and the wider ‘blogosphere’, have been alive tonight (UK time), with people commenting on, or mostly simply repeating, the news that Facebook have implemented the on all their events, making them parsable by machines and thus easy to add to desktop or on-line calendars. They’ve also included for venue details.

This is generally a good thing, but what most people — at least some of whom should have known better — failed to notice was that the implementation is broken.

Consider this event, a concert by my friends’ band, Treebeard:

which, as you can see, is on 18 February from 19:00–22:00 (7–10pm).

That’s encoded, in the Facebook page’s mark-up, as:

<span class=”dtstart”><span class=”value-title” title=”2011-02-18T19:00:00-08:00“> </span>19:00</span> – <span class=”dtend”><span class=”value-title” title=”2011-02-18T22:00:00-08:00“> </span>22:00</span>

The “-08:00” at the end of each date-time value represents a timezone 8 hours behind UTC (as we must now call Greenwich Mean Time) — that would be correct were the event in California, or elsewhere in the Pacific Time Zone; but for the UK, the mark-up translates to 3-6am.

Since the event is in the UK, the start time should be encoded as “2011-02-18T19:00:00+00:00“, which puts it in the correct UTC timezone (in British Summer Time, it would be “2011-02-18T19:00:00+01:00“). Ditto for the end time.

The same will apply for events in any other timezone on the planet, each with an appropriate adjustment.

I’ve already alerted Facebook developer Paul Tarjan to the problem, and this is my response to his requests for assistance in fixing it.

Twitter: A microformat in lieu of a protocol

In May of this year I wrote about the problems of URLs for a given Twitter user’s profile, or for an individual post or “status” being different, depending the Twitter client in use. I suggested a new protocol for Twitter links. [You might want to read that, before the rest of this post]. I can’t believe I didn’t think of this simpler solution sooner!

The answer (in the short term) is to use a microformat (or a microformat-like “poshsformat”, if you prefer to call it that) for each case. Let’s say we use the classes twitter-user & twitter-status.

User-agents (that’s jargon for browsers) could then employ a script (such as those used by GreaseMonkey, or a Firefox extension) to ignore the encoded URL and substitute the equivalent for the user’s preferred Twitter client instead.

For links to user profiles:

<a
href="http://twitter.com/pigsonthewing">
Andy Mabbett
</a>

would become:

<a
class="twitter-user"
href= "http://twitter.com/pigsonthewing">
Andy Mabbett
</a>

and:

<a
href="http://accessibletwitter.com/app/user.php?uid=pigsonthewing">
Andy Mabbett</a>

would become:

<a
class="twitter-user"
href=" http://accessibletwitter.com/app/user.php?uid=pigsonthewing">
Andy Mabbett</a>

Likewise, for individual statuses:

<a
href="twitter.com/pigsonthewing/status/1828036334">
something witty</a>

would become:

<a
class="twitter-status"
href="twitter.com/pigsonthewing/status/1828036334">
something wittyg<a>

and:

<a
href="accessibletwitter.com/app/status.php?1828036334">
something witty<a>

would become:

<a
class="twitter-status"
href="accessibletwitter.com/app/status.php?1828036334">
something witty<a>

and:

<a
href="m.slandr.net/single.php?id=1828036334"
something witty</a>

would become:

<a
class="twitter-status"
href="m.slandr.net/single.php?id=1828036334">
something witty</a>

To simplify matters, the rules for extracting the user ID or the status update could be the same in both cases:

  1. Parse the value of the href attribute of the element to which the class applies.
  2. If there is a question mark, use everything after that.
  3. Otherwise, if there is an equals sign, use everything after that.
  4. Otherwise, use everything after the last slash.

That would deal with all the examples in my earlier post.

So, if you’re using a user-agent which is aware of this microformat, and find on a page:

<a
class="twitter-user"
href="http://twitter.com/pigsonthewing">
Andy Mabbett<a>
said
<a
class="twitter-status"
href="m.slandr.net/single.php?id=1828036334">
something witty<a>

but your preferred Twitter client is Dabr (one I recommend, BTW!) then your browser would treat (and possibly render) that as:

<a
href="dabr.co.uk/user/pigsonthewing">
Andy Mabbett<a>
said
<a
class="twitter-status"
href="dabr.co.uk/status/1828036334">
something witty<a>

Simples!