The tag URI scheme: a simpler, more democratic URN

I recently stumbled across the tag URI scheme, a convention that does a lot of what urn and http-based identifiers do with less ambiguity and confusion. But perhaps I had better explain what I mean by URI first.

Background: There are lots of places in the on-line world where it is helpful to have a tag for something that is guaranteed not to accidentally conflict with a name chosen by someone else. For example, XML namespace identifiers need to be unique. There are various ways to do this; UUIDs (also called GUIDs) and Netnews message-ids are examples of standard schemes.

One way popular at the W3C is to use a URL that you ‘own’ as your unique identifier. Other people will not accidentally choose the same name because if they use a URL it will contain their domain name, not yours. For example, http URLs are used to specify XML namespaces (XHTML has the namespace http://www.w3.org/1999/xhtml, for example), and URIs are used in the topic map core as subject indicators (such as http://www.topicmaps.org/xtm/1.0/core.xtm#class-instance).

XML processors do not have to do anything with the namespace URI except check whether it is the same as some other namespace URI -- they do not download it. Nevertheless there has been continual speculation as to what sort of resource should be retrievable from that address.

For this reason, some people prefer to use URNs. URNs are uniform resource names, meaning they say what something is without helping you to find it—as as opposed to uniform resource locators, which say how to download some resource on the WWW, while remaining vague about exactly what it is. URNs look just like URLs, except that rather than starting with http or ftp they all start with urn.

At various points in WWW history, URNs have been considered a type of URL (because they use the same syntax as URLs), or as distinct from a URL (because they do not actually give the location of a resource). To avoid having to decide one way or another, WWW recommendations use the term URI (uniform resource identifier) to refer collectively to URLs and URNs. To further muddy the waters, there are plenty of non-resolvable URL schemas (you cannot actually download isbn:1-86197-612-7), and RFC 2168 discusses how to resolve URNs!

One problem with URNs is that although the URN concept and urn prefix have been around since the start of the WWW, a formal syntax for URNs and systematic management of the URN namespaces are relatively recent and fairly obscure. Before then, people assumed that any string likely to be unique with urn and a colon added to the start would do. As a result, there are plenty of bogus URNs out there.

Another tricky thing is that if I wanted to mint some URNs to use as XML namespaces or whatever, I would have to apply to the IANA for a URN namespace (documenting how the identifiers in that namespace are checked for uniqueness, and so on), attend IETF meetings, etc., all of which is a lot of bother. So URNs do not make sense for everyday purposes, only for applications with global scope and big travel budgets.

The tag scheme, by contrast, is refreshingly simple and unencumbered by bureaucracy. Anyone with a domain name or even just an email address can mint their own URIs without going via any centralized checking. For example, if I want to propose a published subject indicator for W3C date-time formats, I can use tag:alleged.org.uk,2004:datetime:w3c without needing special permission.

So what’s the advantage, if any, over using an http URI like http://www.alleged.org.uk/2002/datetime.html#w3c? One difference is that the http version commits me to maintaining an HTML page indefinitely; if for some reason I changed my mind, or the domain lapsed or whatever, the URI would remain as a dangling pointer to a resource that no longer existed. It also means that I cannot easily publish the description in a different format (such as XTM or RDF).

The tag URI makes no promises of resources, which is actually an advantage because it can’t break promises it doesn’t make. You might worry about how one finds information about what a tag URI means, but don’t. First, it has no meaning without surrounding context, which should furnish clues, and second, there’ll always be Google.

Update 2004-05-29. Mark Pilgrim has an article on Atom ids describing tag URIs.

Update 2006-11-01. The tag scheme is now published as RFC 4151.