dinsdag 27 januari 2015

Linkrot and the mset^H^H^H^H data-versiondate attribute

I run an archaelogical website (http://vici.org/) that has a database consisting of over 20000 records as its backend. Many of these records provide external links. Of course, every now and then linkrot creeps in and links stop working or direct to a page with content other than intended.

To overcome this I've started creating a tool that wil auto-archive all external links. When a user clicks on a link, a javascript will invoke a tiny service that returns the HTTP status code of the requested page. If the page is not available (returning a 404) the user will be redirected to a web archive. Aim is that the site will eventually run its own webarchive, auto-archiving each newly discovered link.

When directing a user to an archived version of a page, ideally we link to that very version of the page the author had in mind when he created the link. So we need more information than just a hyperlink. This issue can be solved by following an approach originally suggested by Ryan Westphal, Herbert Van de Sompel and  Michael L. Nelson in "The mset Attribute". Basically it proposes to enrich hyperlinks with an attribute that provides either temporal context or refers to a specific archived copy or both. Their draft has now been superseeded by the Memento Robust Links specification (Robust Links - Link Decoration, see also Robust Links - Motivation).

A hyperlink following this specification could look like:

<a href="http://www.w3.org/spec.html" data-versiondate="2014-03-17">HTML</a>

or

<a href="http://www.w3.org/spec.html" data-versiondate="2014-03-17"
data-versionurl="https://archive.today/r7cov">HTML</a>

I intend to implement the data-versiondate attribute in the CMS of the website. When a new link is added to a record, the CMS will insert a data-versiondate attribute using the current date.

Update 2015-01-27 17:09 (CET): added the Robust Link specs and changed examples accordingly.

PS: See also the W3.org community on Robustness and Archiving.

Geen opmerkingen:

Een reactie posten