16 entries tagged xml

Version numbers in namespaces considered harmful

The various flavours of RSS offer a variety of namespace requirements:

URL RSS version
http://my.netscape.com/rdf/simple/0.9/ 0.9
(default) 0.91, 0.92, 0.94
http://purl.org/rss/1.0/
http://purl.org/rss/1.0/modules/rss091#
1.0
http://backend.userland.com/rss2 2.0

In my opinion, it is a grave mistake to include a version number in a namespace URI. The function of a namespace is to prevent accidental collisions between names defined by different people (or organizations) when two XML vocabularies are combined in one document. The version number of the format can be specified separately (as indeed all the RSS versions do, as an attribute of their root element). If the 0.9 spec had only used http://netscape.com/1999/rss as its namespace (following the lead of http://www.w3.org/1999/xhtml) then all the versions could have used the same namespace.

Why do I care, you ask? Because if you use actual XML tools like XSLT to manipulate RSS feeds, then the fact that there are three or four namespaces in use for essentially the same elements makes the whole thing more complicated. Where I might have had

<xsl:template match="/rss:rss/rss:channel">...

I better be prepared for more complicated expressions like

<xsl:template match="/*/*[lname()=channel]">...

(to allow for any root element, and to ignore the namespace of the channel element). This clutters the XSLT file and makes it harder to maintain—and probably also less efficient. Sigh.

This is not unique to RSS, by the way—I had all sorts of hassle with early ve4sions of SVG tools which were caught out by the ever-changing SVG namespace URL. They finally settled on the quite-sane http://www.w3.org/2000/svg.

Must ... ignore ... HLink!

The TAG rejects HLink, not many dead scream the headlines. Is this the end for XHTML 2?

What’s XHTML 2? It is the next step in the bridge from HTML 4.01 to the futuristic world of ‘pure’ XML documents. XLink is a recommendation from the W3C for how XML documents should express links to other resources. HLink is a new proposal from the XHTML committee for how XML documents should express links to other resources. In effect, they are saying that XLink is inadequate and they need to replace it. TAG have expressed an opinion that XLink should be used instead, presumably on the grounds that we don’t want to have two W3C recommendations for one and the same thing.

Can XLink replace the special-purpose linking attributes in HTML? I suppose we can imagine replacing img and object tags with something like

<object
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
    xlink:href="myLogo.gif"
    width="400" height="300">
  My Logo
</object>

In principal the first three attributes (which would be the same for all images) can be given default values in the DTD. This is the approach used in SVG and MathML. The problem is that it prevents the XML document in question from being stand-alone. That is, the DTD must be downloaded and parsed before the document can be rendered. SVG fudges this; SVG documents often do not have DOCTYPEs, and SVG viewers in effect use a DTD compiled in to the software. All very messy.

There’s another problem: the HTML tag img also allows for URIs for the low-res version (lowsrc attribute), a long description (longdesc), and an client-side image-map (usemap). XHTML 2 also wants to add an href attribute to all elements (so any element in the document can be a link). I’m not sure that XLink defines xlink:show values for all of these. Even if it does, we cannot have more than one simple XLink link per element (since we can only have one xlink:href attribute). We could possibly follow the same system as XTM, with one child element per link:

<object style="width:400px; height:300px;">
  <source xlink:href="myLogo.gif"/>
  <usemap xlink:href="#logoMap"/>
  [My Logo]
</object>

(Here we are assuming that the DTD is used to generate default values for the other Xlink attributes—but do we really want to rely on all XML browsers being validating browsers...?)

The question is, will this work? Yes, if we are using a specialized XHTML-2-savvy browser (one which understands object and source, and knows how to interpret them). If the aim is to make XHTML 2 implementable using only XML + CSS 2 + XLink + XBase, without making XHTML a special case, then the answer is no.

The upshot of this is that, if the W3C want to make XHTML 2 just another XML format, displayable in a generic XML browser, then it looks like XLink is not quite right for the job. It may well be that I am missing something, and the above example can be reworked to work with XLink. It might be that lowsrc and longdesc are dumb features that no-one wants to carry over in to XHTML 2 anyway. But my naïve understanding of XLink and the nascent XHTML 2 suggests that the XHTML working group might have a point.

How is this going to end? Right now it looks to an outsider like the question is being discussed less in terms of technical issues like what XHTML’s goals are, and more in terms of committees and procedure and politics and such-like. We may end up with an XHTML 2 that requires a specialized XHTML 2 browser (requiring upgrades to existing browsers that recognize the XML namespaces or the DOCTYPE, or any of the other stupid heuristics in use today to distinguish different flavours of HTML). The dawn of XML as a fully-fledged hypertext mark-up language will delayed by a few more years...

I really should not be writing about this—I have plenty of other things to work on. I just find it difficult to tune out all these arguments about XHTML when that’s what I work with every day...

Picky, Skin, SaxLifter

Found some time to continue work on the Picky Picky Game. I have something which, given a graphics file, writes it in to the correct place in the directory structure. Tonight’s task was a routine for generating the index page, based on the pictures stored so far. In the eventual web application, this routine will be invoked in CGI scripts whenever a new picture is added or vote recorded. For the present I can just run the Python script (one of the ways in which creating web apps in Python is less hassle than, say, ASP .Net or webclasses).

The index page format is mainly controlled through a ‘skin’ file index.skin. This has most of the HTML, with special XML tags for interpolating the dynamic content. This way hopefully Jeremy will be able to hack the HTML without touching any of the application code. (The immediate inspiration for the term skin comes from the Helma Object Publisher system, which does something similar, but using JavaScript.)

The picture metadata is written in XML which is straight­forward enough except that Python’s native SAX support is broken: it does not support XML namespaces! I have fixed this with my own SAX filter dubbed SaxLifter: it processes startElement events by scanning the attributes for namespace prefixes, maintaining a stack of namespace mappings, and generating startElementNS events. Presumably if I were using the XML-SIG or 4Thought enhancements to Python things would work better. Sigh.

The overall strategy is to generate as much static HTML as possible—that is, instead of creating the HTML for the list of pictures afresh each time someone visits the site (which is what PHP and ASP, etc., do), I intend to generate it only when a new picture is added to the list. Since adding pictures will happen much more rarely than viewing the list, this reduces the overall load on the web server. The aim is to use CGI only in the pages that make a change (adding a picture or voting).

Processing XML with Python Mac OS X

I’ve been amusing myself by concocting an RSS reader using XSLT to do the processing. XSLT can even handle the downloading of the RSS files, but this does not allow for caching or aggregating—so I thought I would knock something together in Python.

Installing pyXML

Bizarre fantasies of an Extensible Mark-Up Language

I'm awake at 05:00 unable to sleep and instead working on migrating the working copy of my web site to my laptop so that I can eventually retire my old desktop. One of the importunate thoughts bouncing about in my sleep-deprived brain is something someone at work said about some weirdos he'd heard of who actually (ab)use XML as some sort of (standard) generalized mark-up language. This bizarre (to him) concept involves one choosing a set of XML tags to express the structure of the text (as if text could have structure!), which is ludicrous because how would anyone be able to read it? One would have to use XSLT to transform it in to HTML, so why not use HTML in the first place, eh? Read more

MU part 3: More MUD

Apologies to people reading this via LiveJournal's syndicated feed; a combination of my software converting every header in to an RSS item and LiveJournal duplicating each item every time I edited the title has created a flurry of links to essays that I expect no-one but me has any interest in anyway. Read more

MU compared with ...

I have been outlining a hypothetical alternative to XML that I am calling MU. In this note I compare MU to some other mark-up notations. Read more

Why XML is bad for data serialization

A lot of people use XML to serialize data structures; with the XML parsers bundled with many programming environments it is easier than writing one's own parser. But XML was not designed with this in mind and contains too many traps causzed by the mismatch between the XML object model and that of your application. A text format designed expressly for for the purpose (my favourite is YAML) would be more convenient and safer. Read more

Website tweaks

I've started a gradual redesgin of my personal webspace. Anyone who actually visits the page will have noticed I added a background pattern taken from Squidfingers.com. I am in the process of revamping the links to other stuff I do on-line. Read more

Migrating my website workspace, part 1

My website is maintained by a rather complex amalgamation of software, accreted over generations. Having migrated it from my old desktop lickity to my new(ish) PowerBook Ariel, I now want to migrate it again to my new server Tranq (a Tranquil PC T2); this will allow me to use cron to keep some parts up-to-date automatically. Read more

Identity and CAP Alert Messages

CAP is the OASIS Common Alerting Protocol, which is a specification of an XNL format for disseminating warnings of hurricanes, earthquakes, and suchlike. The CAP v1.1 format is mandated by the European R&D project I am working on. This is an inconvenience, because CAP is badly flawed XML standard. I am going to discuss here some of the problems I have had with message identity as defined by CAP. Read more