May 2003

Processing XML with Python Mac OS X

I’ve been amusing myself by concocting an RSS reader using XSLT to do the processing. XSLT can even handle the downloading of the RSS files, but this does not allow for caching or aggregating—so I thought I would knock something together in Python.

Installing pyXML

Picky Picky Game: No writing files

As mentioned earlier, I am working on a version of the Picky Picky Game that does not need to write to files (since some web servers will be set up that way for security reasons). In some ways this simplifies things, because it means there is only one URL hierarchy to worry about.

URL design

Backslashes in URLs

Had a bug in the Picky Picky Game where uploaded pictures might have backslashes left in their names (the picture name being derived from the file name supplied by the client computer). Technically it is OK to have backslashes in a URL, and they should be treated like any other character. Some web browsers second-guess you, however, and replace backslashes with slashes (http://foo/bar\baz is treated as http://foo/bar/baz), with the result that these pictures failed to appear.

The solution is, of course, to (a) change the code for translating file names in to picture names so that it removes backslashes, and (b) fix the existing databases. ZODB makes the second part pretty easy; having acquired a Game instance from the databse, you just run a script like

for rn, r in game.rounds.items():
    for pic in r.pictures:
	s = r.sanitizedName(pic.name, pic)
	if s != pic.name:
	    pic.name = s
	    pic.dataUri = s + picky.mediaTypeSuffix(pic.mediaType)

The function sanitizedName is the one that has to be fixed for part (a).

CGI vs. If-Modified-Since (part 2)

Luckily, Apache honours the Status pseudo-header, and so my my Picky Picky Game CGI script can issue response code 304 and it works. Yay. My legions of testers report that the downloading goes more smoothly now.

The finished panels now also have text below them saying how many candidate panels were uploaded, and how many votes were recorded. The tricky bit was making sure it says ‘votes’ when there are 0 or more than 1 votes, and ‘vote’ when there is only one.

The dynamic HTML is generating using a template file that I have whimsically named a skin. This is a more-or-less XML file (meaning that I intend it to be well-formed XML, but actually process it as plain text). Mostly it is XHTML, but it can include elements in special namespaces like <p:version/>. These are replaced with text generated by Python functions or strings (e.g., <p:version/> is replaced by something like 0.9 <pdc 2003-05-19>). The comics panel at the left of the index page is produced with the following incantation:

<p:panel subtract="3" skin="index-panel"/>

This says to render the panel whose number is 3 less than the current panel (so if the current panel is #20, this shows #17), using the skin contained in the file index-panel.skin. This allows the hypothetical graphic designer of the Picky Picky Game considerable latitude in how panels are displayed.

While rendering index-panel.skin, it comes across this fragment:

<p class="detail">
    [<p:pictureCount singular="candidate"/>,
    <p:voteCount singular="vote"/>.]
</p>

The p:pictureCount element sort of inherits the subtract attribute of the original p:panel, because it gives the number of pictures in panel #17 (as opposed to the current panel). The singular and plural attributes (if provided) specify text to follow the number (in the absence of an explicit plural attribute, it adds s to the singular).

Simplicity itself.

Update (2003-05-20): I have corrected a typo the above URL. Sorry for any inconvenience.

CGI vs. If-Modified-Since and other stories

I demonstrated the Picky Picky Game prototype to the CAPTION committee. The main trouble was picture resources not being downloaded, or, oddly, vanishing when one refreshed the page. It worked better on the dial-up than the broadband connection (though Jo blames that on IE 6 being set up to cache nothing). I resolved to make an effort to sort out caching—or at least, the things my server needs to do to enable caching to work smoothly.

So, in my development system at home, I added If-Modified-Since and If-None-Match (etag) support to the routine that fetches picture data out of the database. I also added an Expires header set, as RFC 2616 demands, approximately one year in to the future. Result: none of the pictures appear.

The problem is that the web server I am using at home always returns status code 200 for CGI scripts (it ignores the Status pseudo-header). As a result, my clever 304 (‘Not modified’) responses result in apparently zero-length data. Argh!

When I worked this out, I though I would demonstrate to Jeremy that it worked in the stand-alone server (which does not use CGI). But Lo! all the pictures failed to appear once more. So did the page itself. What gives?

This time the trouble was its logging function—it tried to resolve the client IP address. Now, I thought that the address used by my PowerBook did have reverse look-up in my local DNS, but in any case, the server should not be indulging in DNS look-ups given that on my system that is a blocking operation that tends to mean the program locks up for 75 seconds. Luckily BaseHTTPServer makes it easy to override the function that indulges in DNS queries and it all now runs smoothly.

On the positive side, I have made one cache-enhancing change that has worked, albeit only for the old panels (which saved their images as separate disc files rather than in ZODB). Simply put, there is another base URL used (in addition to the base of the web application and the base URL for static files), and this one is for picture files. This means that these old pictures are now, once again, served as static files, with etags and caching the responsibility of my host HTTP server, not me.

Daily archives

I have now rejigged my log-rendering scripts (used to create this very web site) so that entries are archived on daily pages rather than monthly ones.

This was necessary because I’m writing more entries and longer ones. In recent months I had started splitting each entry in to a leader paragraph that linked to the full article, but this was kind of clunky, and I never automated it. With one day’s posts per archive page this is less of an issue.

It would have made the topic pages awful long, however, so I have switched the non-archive pages to only show the first article on the page in full, and show the rest as links to the archive page. This is much tidier, and makes the site more like a collection of short essays than a weblog, which it really isn’t.

There are still a few details to sort out. The year indexes are confused by the change in format (at the time I write this, the yearly archives omit May), and the newly introduced monthly indexes need to be generated. More on this as I get around to it.

Update (2003-05-21): I have created the new per-month index page. The Archives listing at the top roghjt of the page now has a ‘by topics’ link.

On to Panel 17!

I have taken the liberty of bumping the Picky Picky Game forward to the next panel. I have also cranked the speed up a notch: each ‘week’ will be 3 days for the next little while.

The idea is to let us test the various behind-the-scenes mechanisms of the game so that we can risk mentioning it to people at COMICS 2003 in Bristol this bank-holiday weekend.

The CSS has been tweaked to not use auto margins, because they are not supported in MSIE.

Now with BMP support

To make it easier to create pictures for the Picky Picky Game, I have added code to automatically convert Microsoft Windows Bitmap (.bmp) files into PNGs.

This is straightforward because I already was using PIL to check the dimensions of the images. Converting to PNG (if the format is one PIL knows) is pretty simple:

permittedTypes = ['image/png', 'image/jpeg', 'image/pjpeg', 'image/gif']
...
if not imt in permittedTypes:
    buf = StringIO.StringIO()
    im.save(buf, 'PNG')
    data = buf.getvalue()
    logger.log(slog.WARNING, 'Converted your image from %s to image/png' % imt,
	       'This may have lead to a slight loss of image quality.')
    imt = 'image/png'
    buf.close()

The above goes in the sequence of checks on uploaded images (after the check for width × height, but before the check for number of bytes). I think I spent longer creating a BMP image to test it on than I did writing the new code!

The advantage of BMP support is that, if you have Microsoft Windows, then you definitely have Microsoft Paint installed. So long as you know about Start menu → Programs → Accessories → Paint, and the Image → Attributes menu item, you can create panels for Picky Picky Game.

Fixed a JavaScript problem, one remains

Before going in to work today I have managed to fix one of the JavaScript problems (it causes MSIE to report ‘one error on page’), but only half-fixed the other (which causes artists’ names with links to vanish when you cycle through the panels). In the latter case, the name no longer vanishes, but, alas! the link does.

I think need a JavaScript debugger—in other words, to install Mozilla on my PowerBook (the only computer I own with enough welly for Mozilla). Ho hum.

Back to your regular schedule

I have returned the Picky Picky Game to its weekly schedule. The reason for going at double speed was to fill up the home page with non-gash pictures, and this has been achieved. So you have until Thursday night to upload a candidate for panel 19.

We just got back from COMICS 2003, which was fun. We mentioned Picky Picky to as many people as we could, so either the server will be overwhelmed with activity, or no-one will bother clicking though and we will be miserably ignored. Who knows what the future holds?

The journey back was a disaster—approximately 4½ hours (mostly spent waiting in cold drafty stations). A nasty combination of reduced Oxford–Bristol service, engineering works and football game crowds made the train journeys particularly unpleasant—at when there was a substitute bus we got to sit down...

Using Microsoft Paint to draw a panel

I have written a short note on how to use MSPAINT to draw a Picky Picky Game panel. The advantage of MSPAINT is that it is available on all Microsoft Windows computers, even ones not set up for image enditing.

Much to my suprise, there is no longer a drawing prgoram bundled with all Apple Macs—my PowerBook 12″ is without a drawing program.

Remember my details

I have added JavaScript to the upload form Picky Picky Game on caption.org to optionally remember your details for next time (using a cookie). This way you don’t have to enter your URL each time you upload a new panel.

Debugging JavaScript without a JavaScript debugger is a real pain in the arse, and illustrates how subtle aspects of language design affect the experience of working in that language. There is one crucial difference between Python and JavaScript. In Python, a variable is implicitly created the first time you assign to it; in JavaScript, it is created the first time you refer to it. This means that the following fragment is valid JavaScript:

var cookieHeader = this.$document.cookie;
var m = myRegexp.exec(cookiesHeader);
if (m) {
    ... use the match info to process the cookies ...
}

The equivalent Python looks like this:

cookieHeader = self._document.cookie
m = myRegexp.search(cookiesHeader)
if m:
    ... use the match info to process the cookies ...

In the JavaScript version, the regexp (used to extract one cookie from the Cookies header) will mysteriously never match and you will spend ages scrutinizing the regexp and flipping though the documentation on what is and is not valid regexp syntax in JavaScript. In Python you will get an error message telling you that the variable cookiesHeader is referred to before it is assigned to—and immediately realise its name is misspelled in the second line.

The tedious thing about testing the ‘remember me’ option is that it involves repeatedly doing the very thing it is supposed to be saving me from: entering my URL and details on the picture-upload form. Luckily I was testing on Safari, which has a form auto-completion feature that makes repeatedly filling in the form less annoying—but which also makes the ‘Remember me’ feature almost entirely redundant ;-)

Now with comments!

I have added a simple comment system to the Picky Picky Game. Go me!

There are all sorts of design considerations when it comes to on-line comments. I was aiming at simplicity so it has no branching (threading), no HTML ... no nothing, basically. URLs in your posts get magically turned in to links, and blank lines become paragraph breaks, but that is about all. There is one discussion page per panel.