Tag Archives: microformat

The BBC, Open Content and Wikipedia

I had a really interesting meeting with Robin Morley, the BBC‘s Social media lead for the English Regions, a couple of weeks ago. After he gave me a very interesting tour of their premises in Birmingham’s Mailbox (where, in its former guise as Royal Mail’s Birmingham head office, my father Trevor had an office), he described to me the work he does.

We then discussed how his London colleagues insert automatically content from Wikipedia, into the BBC website’s pages on wildlife (example: Barn Owl), and on music (example, of course, Pink Floyd). I contributed to the former by writing markup to make them emit the ‘species’ microformat, of which I’m also the author.

Screen capture of BBC article on Pink Floyd, linked to in post

BBC article on Pink Floyd, including Wikipedia content (links to original article)

They are able to do this because all of Wikipedia’s content is available under a Creative Commons Attribution-ShareAlike licence. In other words, anyone can reuse it, for free.

I suggested to Robin that his news staff could similarly reuse Wikipedia content. For example, the article “Birmingham Assay Office silver name plaque stolen“:

screen shot of BBC article linked to from this post

BBC Birmingham & Black Country article on a theft from Birmingham Assay Office (links to original article)

could use text from Wikipedia in a pullout (a sub-section, or box at the side of the article) which might say:

The Birmingham Assay Office is one of the four remaining assay offices in the United Kingdom.

It opened on 31 August 1773 and initially operated from three rooms in the King’s Head Inn on New Street employing only four staff and was only operating on a Tuesday. The first customer on that day was Matthew Boulton. The hallmark of the Birmingham Assay Office is the Anchor.

Services provided by the office include nickel testing, metal analysis, plating thickness determination, bullion certification, consultancy and gem certification.

Text in this section copyright Wikipedia authors, licenced Creative Commons Attribution-ShareAlike licence

All that would be required would be for credit to Wikipedia to be given, and the pullout text (but not the whole BBC article) to be made available under the same open licence, as above.

This could be done on articles about all sorts of topics: people, places, organisations, events and more, as well as sports reports.

Robin seemed to like the idea, so I’m looking forward to seeing how he and his colleagues make use of Wikipedia content.

Update: Another post, “The BBC, Regional News and Sport, and Hyperlocal Blogs” about something else we discussed at our our meeting, is now published.

Extracting contact data from e-mail signatures

6 Replies

0000-0001-5882-6823

I often receive emails with signatures (footers; also known as “sigs”) like this one:

—
John Doe
Director of Flying Cars, Acme Ltd.,
A: 21, Example Street, Birmingham, B1 1AA
E: john.doe@example.com
T: 0121 555 5555
M: 05555 555555
W: www.example.com

(that first line is the standard sig separator of “dash dash space return”)

It’s irritating, if I want to add the sender to my address book, to have to copy’n’paste each item separately.

It seems to me that it should be possible for mail clients, such as Google Mail, to parse such sigs, and allow the user — after making any necessary edits — to add them to their electronic address books, without needing vCard (.vcf) file attachments or hCard microformat markup (which is not possible in plain-text e-mail).

It would need some agreement (or declaration by fiat) of which short codes to use:

A: = Address
E: = E-mail
T: = (landline) Telephone
M: = Mobile (cell) telephone
W: = Website

and which properties can be plural. It might be determined that the first line should be the name; the second the job title/company — but, as users would be offered an “edit” option before saving the data, that’s not a deal-breaker.

Some thought would need to be given to internationalisation, also: do the above abbreviations make sense to speakers of, say, German or French? What about Japanese or Chinese speakers?

Alternative

Even without the labels:

—
John Doe
Director of Flying Cars, Acme Ltd.,
21, Example Street, Birmingham, B1 1AA
john.dow@example.com
0121 555 5555
05555 555555
www.example.com

it should be possible to scrape some data (e-mail address, phone numbers, website, name if on first line) from a sig.

Over to you

Does anyone fancy writing a demonstrator plug-in to parse such sigs, for an extensible mail client such as Thunderbird, or a browser like Firefox or Chrome? Do such things already exist anywhere?

Fixing Facebook’s Microformats (at their request)

6 Replies

0000-0001-5882-6823

Twitter, and the wider ‘blogosphere’, have been alive tonight (UK time), with people commenting on, or mostly simply repeating, the news that Facebook have implemented the hCalendar microformat on all their events, making them parsable by machines and thus easy to add to desktop or on-line calendars. They’ve also included hCard microformats for venue details.

This is generally a good thing, but what most people — at least some of whom should have known better — failed to notice was that the implementation is broken.

Consider this event, a concert by my friends’ band, Treebeard:

which, as you can see, is on 18 February from 19:00–22:00 (7–10pm).

That’s encoded, in the Facebook page’s mark-up, as:

19:00 – 22:00

The “-08:00” at the end of each date-time value represents a timezone 8 hours behind UTC (as we must now call Greenwich Mean Time) — that would be correct were the event in California, or elsewhere in the Pacific Time Zone; but for the UK, the mark-up translates to 3-6am.

Since the event is in the UK, the start time should be encoded as “2011-02-18T19:00:00+00:00“, which puts it in the correct UTC timezone (in British Summer Time, it would be “2011-02-18T19:00:00+01:00“). Ditto for the end time.

The same will apply for events in any other timezone on the planet, each with an appropriate adjustment.

I’ve already alerted Facebook developer Paul Tarjan to the problem, and this is my response to his requests for assistance in fixing it.

Proposed citation microformat: better than COinS metadata

1 Reply

0000-0001-5882-6823

My new book on Pink Floyd is given as a reference in an article on Pink Floyd’s film, The Wall, on Wikipedia.

The reference is entered using Wikipedia’s {{Cite book}} template, which in turn emits the requisite HTML mark-up, including COinS metadata:

Mabbett, Andy (2010). Pink Floyd - The Music and the Mystery. London: Omnibus,. ISBN <a href="...">9781849383707</a>. <span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.btitle=Pink+Floyd+-+The+Music+and+the+Mystery&rft.aulast=Mabbett&rft.aufirst=%26%2332%3B%26%2332%3BAndy&rft.au =Mabbett%2C%26%2332%3B%26%2332%3B%26%2332%3BAndy&rft.date=2010&rft.place=London&rft.pub= Omnibus%2C&rft.isbn=9781849383707&rfr_id=info:sid/en.wikipedia.org:Pink_Floyd_The_ Wall_(film)">

And here it is again, with line breaks inserted for clarity:

Mabbett, Andy (2010). Pink Floyd - The Music and the Mystery. London: Omnibus,. ISBN <a href="...">9781849383707</a>. <span class="Z3988" title="ctx_ver=Z39.88-2004 &rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook &rft.genre=book &rft.btitle=Pink+Floyd+-+The+Music+and+the+Mystery &rft.aulast=Mabbett &rft.aufirst=%26%2332%3B%26%2332%3BAndy &rft.au=Mabbett%2C%26%2332%3B%26%2332%3B%26%2332%3BAndy &rft.date=2010 &rft.place=London &rft.pub=Omnibus%2C &rft.isbn=9781849383707 &rfr_id=info:sid/en.wikipedia.org:Pink_Floyd_The_Wall_(film)">

Some people complain that the presence of COinS bloats pages which have many references: for instance, note that my name, the publisher (“Omnibus“), publishers’ location (“London“), ISBN etc., plus the article’s URL, are repeated in the title attribute of the span which has class=”Z3988″, used to denoted COinS metadata.

If we were to mark that up with the proposed citation microformat, hCite, it might look like this:

Mabbett, Andy (2010). Pink Floyd - The Music and the Mystery. London: Omnibus,. ISBN <a href="..." class="isbn">9781849383707</a>.

Despite adding semantic class names (hcite, author, vcard, fn, dtstart, work, publisher, label, org, isbn), and span elements on which to hang some of them, this is markedly more compact (down from 614 to 363 characters, excluding the redacted URL), easier for humans to read, doesn’t require lots of escaped characters and doesn’t repeat any of the data. The processing burden on Wikipedia’s servers would also be lower.

The primary tool used for accessing COinS metadata is Zotero; whose authors have already indicated to me (in conversation) an interest in parsing such a microformat.

Discussions of what to include in the proposed citation microformat on the microformats mailing list stalled sometime ago, but that doesn’t stop an organisation like Wikipedia from developing a draft, implementing it, and presenting it to the wider web community for discussion, improvement and ratification. The draft could have 1–1 mapping with the properties in COinS, to facilitate ease of conversion by parsing tools.

Twitter: A microformat in lieu of a protocol

8 Replies

0000-0001-5882-6823

In May of this year I wrote about the problems of URLs for a given Twitter user’s profile, or for an individual post or “status” being different, depending the Twitter client in use. I suggested a new protocol for Twitter links. [You might want to read that, before the rest of this post]. I can’t believe I didn’t think of this simpler solution sooner!

The answer (in the short term) is to use a microformat (or a microformat-like “poshsformat”, if you prefer to call it that) for each case. Let’s say we use the classes twitter-user & twitter-status.

User-agents (that’s jargon for browsers) could then employ a script (such as those used by GreaseMonkey, or a Firefox extension) to ignore the encoded URL and substitute the equivalent for the user’s preferred Twitter client instead.

For links to user profiles:

<a href="http://twitter.com/pigsonthewing"> Andy Mabbett </a>

would become:

<a class="twitter-user" href= "http://twitter.com/pigsonthewing"> Andy Mabbett </a>

and:

<a href="http://accessibletwitter.com/app/user.php?uid=pigsonthewing"> Andy Mabbett</a>

would become:

<a class="twitter-user" href=" http://accessibletwitter.com/app/user.php?uid=pigsonthewing"> Andy Mabbett</a>

Likewise, for individual statuses:

<a href="twitter.com/pigsonthewing/status/1828036334"> something witty</a>

would become:

<a class="twitter-status" href="twitter.com/pigsonthewing/status/1828036334"> something wittyg<a>

and:

<a href="accessibletwitter.com/app/status.php?1828036334"> something witty<a>

would become:

<a class="twitter-status" href="accessibletwitter.com/app/status.php?1828036334"> something witty<a>

and:

<a href="m.slandr.net/single.php?id=1828036334" something witty</a>

would become:

<a class="twitter-status" href="m.slandr.net/single.php?id=1828036334"> something witty</a>

To simplify matters, the rules for extracting the user ID or the status update could be the same in both cases:

Parse the value of the href attribute of the element to which the class applies.
If there is a question mark, use everything after that.
Otherwise, if there is an equals sign, use everything after that.
Otherwise, use everything after the last slash.

That would deal with all the examples in my earlier post.

So, if you’re using a user-agent which is aware of this microformat, and find on a page:

<a class="twitter-user" href="http://twitter.com/pigsonthewing"> Andy Mabbett<a> said <a class="twitter-status" href="m.slandr.net/single.php?id=1828036334"> something witty<a>

but your preferred Twitter client is Dabr (one I recommend, BTW!) then your browser would treat (and possibly render) that as:

<a href="dabr.co.uk/user/pigsonthewing"> Andy Mabbett<a> said <a class="twitter-status" href="dabr.co.uk/status/1828036334"> something witty<a>

Simples!

Andy Mabbett, aka pigsonthewing.

Freelance Wikipedia, Wikidata and OpenStreetMap consultant and Wikimedian in Residence, from Birmingham, England.

Tag Archives: microformat

Extracting contact data from e-mail signatures

Alternative

Over to you

Like this:

Proposed citation microformat: better than COinS metadata

Like this:

Share this:

Like this:

Alternative

Over to you

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: