Maintain Static HTML with Kid Templating

I have redesigned the Alleged Literature front page. This isn't just a visual change, though: I have changed the way I generate the HTML so that I am using Kid templating rather than TclHTML. So here's a brief introduction to using Kid to maintain static HTML pages.

What is this Thing You Call Static HTML?

Computer folks use static versus dynamic to describe different sorts of document. Static means the contents of the document are inert: all you can do is display it. Dynamic documents contains programs that can change the document in response to something: each time you look at it it might be different. There is an extra complexity on the web, because there are actually two places where a web page might be static or dynamic: the copy in your web browser, and the original it is a copy of, which lives on the web server. It is the second of these meanings that I am interested in in this article.

When you buy web-server space from an ISP (such as Black Cat Networks, in my case), static web sites are cheaper than dynamic ones. Partly this is because dynamic web sites are require more processing power to make them work: each time someone requests /hobbit.aspx?who=frodo, a program is run that calculates what the page should look like and writes the HTML out; on a static web site, the page /hobbits/frodo.html is just a file on disc that is copied to the reader's computer without further processing. The Alleged Literature web site works this way.

On a static site, a photo gallery (such as Aviemore 1999) is a bunch of separate files, each with lots of elements the same as one another (e.g., the navigation). Creating these all by hand would be tedious and error-prone; instead I have a collection of programs that generate these pages for me, converting a list of photos in to all the separate pages required to represent the gallery. The upshot of this is that a lot of this site is generated by software very similar to what would go in a dynamic web server; the difference is that I run it on my computer here to create the HTML files that get deployed on the web server.

Generating Web Sites with TclHTML

A lot of the generation of pages on my site uses an HTML-generating solution I concocted some years back called TclHTML (inspired by cgi.tcl by Don Libes). This works by extending the programming language Tcl with commands for generating HTML. Tcl code looks like this:

for {set i 0} {$i < 10} {incr i} {
    puts "$i squared is [expr $i * $i]"
}

Using the TclHTML library you can generate an HTML table of the squares from 0 to 9 pretty easily:

table {
    tr {
        th x
        th "x[sup 2]"
    }
    for {set i 0} {$i < 10} {incr i} {
        tr {
            td "$i"
            td "[expr $i * $i]"
        }
    }
}

The HTML commands table, tr, td, and th work like regular Tcl commands like for and puts. Neat. If you dislike regular HTML syntax, this can even be an attractive syntax for writing documents in general.

One downside of embedding HTML in another programming language's syntax is that you have to be aware of two different syntaxes both at the same time: you need to keep in the back of your mind an idea of what HTML you want to generate, and you have to avoid tripping over places where Tcl requires you to escape [...], wheras HTML requires you to know to escape & and <...>. It also means that whenever someone wants to suggest an HTML fragment to include in a page, you have to translate it from the HTML supplied to the Tcl that will generate it. This gets annoying.

There is another problem that lots of HTML-generating solutions suffer from---including ASP and PHP---which is that the HTML that determines the visual appearance of the web site is tangled up in the code that creates it. Changing a list of stuff from a table to a ul list or whatever becomes a non-trivial programming task.

XML Templating using Python with Kid

Since creating TclHTML I have switched from Tcl to Python as my favourite programming language---to the extent that the only time I use Tcl nowadays is working on the site. I am also less convinced by the idea of embedding HTML in a programming language than I was ten years ago. The whole XML ecosystem has grown up in the meantime, and it makes a lot of sense if your document is XML, especially if you have an XML-savvy editor like jEdit.

Kid is an XML-based templating language that stands TclHTML on its head: instead of valid Tcl with HTML embedded in it, it has a well-formed XML document with Python fragments embedded therein. For example:

<table>
    <tr><th>x</th><th>x<sup>2</sub></th></tr>
    <tr py:for="i in range(10)">
        <td py:content="i">5</td>
        <td py:content="i * i">25</td>
    </tr>
</table>

Because your Kid template is itself XML, you can display it in a browser to see how your layout is (so it becomes more or less wysiwyg). You can embed as much Python as you like, but it will be easier to have most of your Python code in a separate file and just include the minimum in the template.

I first used Kid templates in TurboGears, a dynamic web-application framework, but for this weekend's experiment, I was using Kid stand-alone. The command

kid index.kid > index.html

runs my template called index.kid to create index.html. The template can use all of Python's methods of extracting data from the various files I keep stuff in to generate the index page, such as:

projects.data (the little badges for old projects),
tws.data (all about Jeremy Dennis's Weekly Strip),
percy.yml (all about Leckford's Percy Street comic), and even
index.yml (text for the index page).

the file index.yml I created for this. The idea is to leave as much of the content of the page out of index.kid as I can. Even the list of style sheets is controlled by index.yml, not index.kid. In the Yaml file I can have things like this:

title: Alleged Literature
style sheets:
- css: 2006/main.css
- css: 2006/print.css
  media: print

(amongst other stuff). This is read in and used by the Kid template:

<?python
import syck
...
doc = syck.load(open('index.yml', 'r'))
...
?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#">
    <head>
        <title py:content="doc['title']">Some Title Goes Here</title>
        <link py:for="ss in doc['style sheets']" 
                rel="stylesheet" type="text/css" media="${ss.get('media')}" href="${ss['css']}"/>
    </head>

the logic behind this split is to make it so that a lot of editing jobs can be achieved by editing the YAML file, which is more readable than the HTML version. (In the case of style sheets this is a bit of a stretch, but this is, after all, something of an experiment.)

Projects List

As a more interesting example there are the Weekly Strip (TWS) and projects listings. Projects are defined by a URL, the title, and the URL of the little image to use for the link; TWS strips similarly have the URL of the big picture, the URL of the detail picture, the title of the strip, and, optionally, the number of a LiveJournal page mentioning the strip. These are both listed in text files called projects.data and tws.data, that look like this:

2006-06-12.jpeg 2006-06-12.80.jpeg "Black Markers" 327226
20060119.gif 20060119-detail.gif "Snow Queen" 328276
20060624.jpg 20060624-detail.jpg "The Face that You Wear"
2006-07-07.jpeg 2006-07-07.80.png "Unexpressed Distress" 330855

except there are over 200 lines in the TWS file. The format is designed to work well with Tcl, but I use the same data file in the new Kid-based version. To do this, I created a small Python module datafiles.py that understands how to read this format and make it easy to use in the Kid code. In fact, all that happens in the template is the following few lines:

<?python
import random
import datafiles

projects = datafiles.Projects(doc['projects data'])
random.shuffle(projects.lines)
?>
...
<div py:for="project in projects" class="project">
    <p>
        <a href="${project.href}" title="${project.title}">
            <img src="${project.icon_src}" width="100" height="50" alt="${project.title}"/>
        </a>
    </p>
</div>

The actual appearance is created with CSS.

This keeps the amount of Python code in the .kid file to a minimum. The datafiles module should be reusable in future (assuming the future involves doing more work with TWS data). This gives me a better separation between the programing code and the presentation (the HTML), which should make it easier to change things in future.

Slight Quibble for HTML Support

Kid generates well-formed XML, but if I want my XHTML badge I need to generate valid XHTML; for that I need to add a doctype declaration to the top of the file. So the actual rule in the Makefile is as follows:

.kid.html:
    kid $< | sed -f xmltohtml.sed > $@.new
    mv $@.new $@

Where the file xmltohtml.sed contains the following:

1i\
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" \
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
/<\?xml/d
s, */>, />,g

This is a sed program that inserts the doctype at the front of the file, deletes the XML declaration (which is, as always, redundant), and also ensures there is a space before the slash in empty elements (although as it turns out, Kid already inserts a space before the slash).

Status

The task I set myself was to see if I could replace the old TclHTML index.th with one based on Kid. After about a day and a half's work, I have a working replacement for the old system that works. The separation of code, data, and presentation is better and the code is tidier; I think this will make it easier for me to modify the design in future, if only because Kid, Python, and XML are skills I practise almost every working day, whereas I am losing my fluency in Tcl through lack of exercise.

My intention now is to work towards slowly replacing TclHTML with Kid in those parts of the site I am still working on (some of the older galleries can just stay as they are, since I don't plan to work on them anyway). The next Caption web site will probably be done in the same way as well.