This is a continuation of my pointless musing about a hypothetical alternative to XML called MU.
The importance of metadata
A MU instance cannot stand entirely alone -- no text file can, because you need to know the encoding used to convert the character data to a the sequence of bytes actually stored on disc or transmitted over the network. (As discussed in the previous entry, the MU data does not describe its own encoding.)
I will describe two ways to store metadata for MU documents, one using
MIME media-type parameters and the other as a separate resource in a
format called MUD. A MUD-savvy web servers will use the metadata in
the MUD to construct the
In the fantasy world in which MU exists, the Internet media-type for
text/mu, with variations like
permitted in a manner similar to the
There is some discussion as to where XML data should be
text/xml. I am pretty sure that the reasons
for preferring the former is that (a) not all XML data is supposed to
be a document for reading by human beings, and (b) media-types
text/ are subject to transcoding, and that would break
XML documents that mention their own encoding. MU is not intended as
a general-purpose serialization format, and can be transcoded (because
the program doing the transcoding will emend the metadata), so I think
text/mu OK. I might be wrong, though.
charset parameter specifies how to convert the bytes in to
characters. Thus one can have things like
And for compatibility with other RFCs I am too lazy to look up,
- an omitted
charsetmeans the MU document is in strict US-ASCII;
- if the charset is UTF-16, then the data may be (a) a byte-order mark 0xFF 0xFE followed by UTF-16LE data, (b) a BOM 0xFE 0xFF followed by UTF-16BE data, or (c) UTF-16BE data with no BOM.
The first point diverges from HTML, which vaguely assumes ISO 8859-1
in the absence of a charset but then requires increasingly
hair-raising guesses depending on the byte values the parser
encounters. But MU documents must always be accompanied by a valid
media-type, and that MUST include a
charset if the encoding is not
Another parameter is
mud. This gives one or more URIs for MUD
resources to be used when interpreting this MU resource. This is a
list of one or more URIs separated by whitespace. For example,
text/mu; charset=UTF-8; mud="http://example.org/muds/formz.mud"
Note that MUDs for popular formats will of course be cached, so the standard URI should be used wherever possible.
In this document I am not describing MUD features in complete detail as would be required by a real spec. The underlying format of the MUD unit is YAML. A formal specification of MUD, if such a thing extsited, might define a syntax that is a subset or profile of the full YAML syntax. For example:
media type: text/html+mu charset: UTF-8 features: - tag:alleged.org,2004:mu:link/1.0 - tag:alleged.org,2004:mu:object/1.0 namespaces: html: http://www.w3.org/1999/xhtml dc: http://purl.org/dc/1.0
media type property is MIME type and subtype, sans parameters.
charset gives the character encoding of the MU. A MUD-savvy web
server would use these to generate a complete MIME media-type for the
MU resource, with a
mud parameter pointing at the MUD unit. Note
that the media type from the web server takes precedence over any
media type and
charset parameters in the MUD (to allow for
The connection of MU to MUD for files on a computer's file system will
depend partly on the underlying operating system. On a Macintosh with
HFS, the MUD unit might conceivably be a
MUD! resource of the MU
document. Ditto for other operating systems which allow multipart
files natively -- although if I were to point out that this includes
NTFS it would probably suprise a lot of Windows NT developers!
Otherwise processors could automatically expect that a document
foo/bar.mu will have a MUD resource available as
imports has as its value a list if URIs, each on a separate
line, introduced with a hyphen-space (YAML list syntax). For example:
imports: - http://www.example.org/2004/html.mud - http://www.example.org/2004/formz.mud - http://www.example.org/2004/mathml.mud
Imports means reading in the extra MUD files and merging their properties.
base can be used to specify the base for partial URIs (same
The use of imports allows for a MUD embedded in a file (as a Macintosh
resource, say) to be very short, limited to a
charset and an
The purpose of namespaces is to allow sets of tags defined by
different organizations to be mixed without prior arrangement.
Namespaces are not intended to be used to indicate the
interpretation of the tags; for that, see the
namespaces parameter is a mapping from prefixes to URIs. In the
MUD this is represented as namespace-colon-URI pairs, one per line,
indented (the YAML syntax for maps). For example:
namespaces: html: http://www.w3.org/1999/xhtml form: tag:example.org,2004:formz
Each of these maps a namespace prefix to the subject indicator for its namespace. No implication is made that there is a downloadable resource at this URI; they are used merely to supply a globally unique identifier.
There is no default namespace. Tags with no prefix are promiscuous so far as special meanings (specified by features, below) are concerned.
MU namespaces are simpler than XML namespaces in a couple of ways: they are specified outside the MU document (not on individual elements), and so apply uniformly throughout the document. A MU editor that is assembling a document from arbitrary fragments can collect together their namespaces and if necessary munge the prefixes to make them unique.
Mixed-namespace documents should not be needed as much with MU as with
XML, because MU does not need to use namespaces to trigger special
features. For example, MU can use
href for links (using the
features property described below), rather than having to have
xml:href as separately namespaced attributes.
Also, inclusion of data in different formats is expected to be done through links rather than in-line.
features introduces a list of URIs that are subject
indicators for features required in the MU processor to correctly
render this document.
subject indicator means a URI that does not refer to actual
downloadable resources, but is used as a token for a software feature
the processor must support to make sense of this document. The URIs
may well be compiled in to the plug-in implementing the feature.
For example, we might define the URI
tag:alleged.org.uk,2004:mu:link/1.0 to mean that
attributes have the same meanings as
Features will generally be associated with a namespace, but that does not mean that the tags that are recognized by a feature must always have a namespace prefix: tags are offered to all feature implementations and features should match tags (a) with their namespace prefix, and (b) with no prefix.
For example, we might define a
switch tag in a namespace
tag:alleged.org,2004:mu:switch and include the URI
tag:alleged.org,2004:mu:switch/1.0 amongst the features.
This allows us to have a MU fragment like the following:
<p> This is HTML with an image: <switch> <img feature="tag:alleged.org,2004:mu:svg/1.0" src="foo.svg"/> <img feature="tag:alleged.org,2004:mu:png/1.0" implementation="msie/5.5" src="foo-noalpha.png"/> <img feature="tag:alleged.org,2004:mu:png/1.0" src="foo.png"/> <span>FOO</span> </switch> And on with the text. </p>
img tags are recognized by the HTML feature,
say, so implicitly in the
switch tag, and
attributes are recognized by the switch feature and are implicitly in
The idea here is to allow modularization of HTML but to only require namespace prefixes when there is ambiguity. This reduces the load on the MU author, which is just as well because we have seen that most HTML developers are terrified of XML namespaces!
If a tag is not recognized by any feature, then the default behaviour is to strip the tags and display its content. This includes the case where a feature is not supported by the application.
Finally, one of the feature URIs would be (say)
and would mean that CSS 3 is used to describe how tags are displayed
(in the absence of other applicable features). I imagine that much of
the HTML compatibility could be expressed in CSS 3.
Does this belong in the MUD or the MU? I think the MUD but this implies that every document needs a separate MUD resource.
Links to other resources. Each link is itself a mapping:
links: - rel: stylesheet type: text/css href: foo.css - rel: alternate type: application/atom+xml href: bar.atom
I think I have gone on about my hypothetical MUD for long enough now.