November 2002
We spent Sunday in London. First to see Marsyas,
Anish
Kapoor’s giant red sculpture in the Tate Modern
(this year’s
Unilever Series installation). As Pete points out, it
is very big, designed to look like it had to be positively
rammed in to the Turbine Hall and almost didn’t fit. The
title suggests that it is (part of) the flayed skin of a
mythological figure, represented in Jack Kirby style as a
titanic giant.
We only stayed a moment, because we were really there to see the
Turner
Prize exhibits at the Tate Britain. The
place has been extensively remodelled since we last
visited—even the route from the tube stop to the entrance
was different—using similar white creamy stone to the
British Museum refit. Jeremy and I liked the Turner Prize
stuff.
I really liked Keith
Tyson’s collection of poster-sized sketches of crazy
ideas and images. (One of the complaints people were making in
the comments room was that modern artists cannot draw. This is
patently not the case Keith Tyson’s idea-posters.) His
three-dimensional works start in poster form—in this case
he had the diagram for The Thinker (After Rodin)
and the sculpture itself in the same room. His
Thinker is sort of the reverse of Rodin’s:
instead of representing the appearance of a person
thinking, it does something that represents the thinking itself:
there is a computer system inside the column running a virtual
world simulation. If you put your ear to the column you can
hear its ‘thoughts’.
To round off the cultural evening we watched the low-budget
British post-apocalyptic movie 28 Days
Later. Afterwards as we walked home we passed some
of the landmarks from the deserted London of the film...
[Written 2002-11-06]
Update (2003-03-08): Corrected the spelling of Anish
Kapoor’s name and ‘Marsyas’
Found some time to continue work on
the Picky Picky Game. I have something which, given a
graphics file, writes it in to the correct place in the
directory structure. Tonight’s task was a routine for
generating the index page, based on the pictures stored so far.
In the eventual web application, this routine will be invoked in
CGI scripts
whenever a new picture is added or vote recorded. For the
present I can just run the Python script (one of the ways
in which creating web apps in Python is less hassle than, say,
ASP .Net or webclasses).
The index page format is mainly controlled through a
‘skin’ file index.skin
. This has most
of the HTML, with special XML tags for interpolating the dynamic
content. This way hopefully Jeremy will be able to hack the
HTML without touching any of the application code. (The
immediate inspiration for the term skin comes from the
Helma Object Publisher system,
which does something similar, but using JavaScript.)
The picture metadata is written in XML which is straightforward
enough except that Python’s native
SAX support is broken:
it does not support XML namespaces! I have fixed this
with my own SAX filter
dubbed SaxLifter
: it processes
startElement
events by scanning the attributes for
namespace prefixes, maintaining a stack of namespace mappings,
and generating startElementNS
events. Presumably
if I were using the XML-SIG or 4Thought enhancements to
Python things would work better. Sigh.
The overall strategy is to generate as much static HTML as
possible—that is, instead of creating the HTML for the
list of pictures afresh each time someone visits the site (which
is what PHP and ASP, etc., do), I intend to generate it
only when a new picture is added to the list. Since adding
pictures will happen much more rarely than viewing the list,
this reduces the overall load on the web server. The aim is to
use CGI only in
the pages that make a change (adding a picture or voting).
Opera 5 omits the boundary parameter when uploading files.
Lynx 2.8.2 does not support uploading files at all (but,
oddly, does generate multipart/form-data
forms
properly—it even gives the charset
parameter
to the content-type of its form items). Python’s
multifile
module raises an exception on all of the
above, for some inputs.
I guess that if I want to handle uploaded images, I get to
write my own multipart/form-data
parsers from
scratch. I have already done this in C++ for work;
I guess I can do it again in Python. Sigh.
I have now written a parser for HTML-4.0 file uploads (forms
with enctype multipart/form-data
). It will need
some finessing to get character encodings to work right, but for
the simple cases I tried it uploaded files flawlessly, and
moreover, plugged in to the back-end script I mentioned in
an earlier installment.
Alas! When I tried uploading from Jeremy’s NT box, my
Python program crashed with an IOError
exception
with
errno=EAGAIN
. I guess I need to do some
sort of loop to fill my buffer. Ho hum.
It occurs to me I may be being harsh on Opera;
I notice that elsewhere they show a preference for
splitting MIME parameters over multiple physical lines. For
example, they use
Content-disposition: form-data;
name="fred"
as opposed to
Content-disposition: form-data; name="fred"
It is just about possible that this confuses thttpd
so that it clips everything after the first CRLF when passing
the headers to my script via CGI...?
On Monday I was troubled by
EAGAIN
interruptions when reading in a CGI
script’s data. It turns out Python has a cgi
module already. But when I tried creating a script that
used that, it failed to work with Opera’s boundary-less
multipart (the built-in cgi
module uses the
multifile
, which I tried and rejected
earlier).
I have tried looping until EAGAIN
does not
happen—but I put a limit of 10 iterations so as not
to chew up the CPU. No dice. I have also tried using the
fcntl
module to remove the O_NONBLOCK
flag from stdin. The result is that instead of crashing with
EAGAIN
it waits indefinitely (and gets interrupted
by thttpd
’s watchdog timer).
The upshot of this is that I have the beginnings of a CGI
script that works if I connect to it from the same machine the
server is running on, but not if I connect to it from a
different machine (an NT box) on the same network.
The thing is, I know that people have successfully
written CGI programs in Python, and none of the examples
I find on-line have any mention of these phenomena.
Jo Charman has created a LiveJournal
‘syndication account’ for me. As a result you can
see my RSS feed, converted in to a LiveJournal
journal. She says that if you have a paid-for LiveJournal
account, you can add pdc
to your
friends roster. And people can comment on the
LiveJournal pointers to my posts. Woohoo.
Updated (4 March 2007).
Updated URL. Corrected the spelling of Jo’s first name.
I throught I’d try out a different CGI framework, such
as jonpy, and this
requires Python 2.2. So I have now installed
Python 2.2 (carefully installing GNU db, expat, etc. first
so these modules will be built). During its self-tests,
I noticed that test_socket.py
takes 2½
minutes (to do something that should take approximately no
time). Come to think of it, initiating connections to my Linux
box from Jeremy’s NT box also takes an inordinate amount of time. That
might be why initiating HTTP connections to my
thttpd
instance also takes an inordinate amount of
time, so long that thttpd
kills the CGI rather than
waste any more time. In other words, my CGI problems may mostly
stem from a broken network stack. Teriffic. This is a
variation on Joel
Spolsky’s law of leaky abstractions: I would like to
be able to believe in POSIX’s abstraction of sockets as
being a lot like a file, but sadly it is all frinked up.
Another reason to spend a week or two installing Debian some time.
I think the way forward for now is probably to ignore the
network problems and cross my fingers when I install it on
the actual server. Given that fairly thorough search of the WWW
and Netnews reveals no discussion of the sort of problems
I’ve been having, I am fairly sure it is some
freakish glitch in my computer...
Judging from their behaviour, it has long seemed that the powers
that be in Oxfordshire hold cyclists in contempt but don’t
feel able to actually come out and admit it. One of their
techniques for discouraging cycle commuters is to make it as
difficult to park a bicycle in town as it is to park a car.
Case in point: up until yesterday, I had the perfect place to
park my bicycle
each work day—it had a roof overhead and
plenty of railings to lock my machine to, and was not in
anybody’s way. The County Council (in
whose car park this spot was) has now fenced off this area with
a big steel fence, with a notice to the effect that it was
reserved for Environmental Services’ motorbikes and
bicycles. When I tried locking my bike to the outside of
the new cage, notices were put up ordering me not to (the
implication being that they would not mind damaging my bike in
the process of removing it).
The upshot of this is that there is nowhere to park near my
office. All the road signs are already occupied by the time
I get in. The foyer already has three bikes in it, but
I would not have used it anyway, having already had a bike
stolen from such a situation. In the end I locked it to
the back of a street sign in the next street along.
Psychologically it feels exposed out there in the middle of the
footpath. I much preferred keeping it out of
people’s way, on the grounds they will then feel less need
to vandalize it…
Even in a toy web application like the Picky Picky Game, it is
possible (but unlikely) that two people will want to upload a
picture at (nearly) the exact same moment. If two processes try
to write the same file at the same time, the results could be a
mess. It follows that we need to include something to
co-ordinate the changes.
Using ZEO to coordinate CGI scripts
My brother
Mike Cugley has put up another
bunch of photos of his son Darren. Here’s one of Dad,
my sister Rachel, Mike, Darren and me in Dad and Josie’s
garden in Ramsgate.
Converting my non-concurrent code to instead use a persistent
store coordinated through ZEO is pretty easy once I’d grokked the
documentation. In fact most of the work consisted of deleting
some of the routines for just-in-time reading back of the
metadata, since that is now taken care of for me by ZODB.
The final piece in the puzzle of my PPG platform is the Python
Imaging Library (PIL) from Secret Labs AB
(PythonWare). This makes it easy to check that the uploaded
images are the right dimensions, for example:
im = Image.open(StringIO.StringIO(data))
width, height = im.size
if width > game.maxWidth or height > game.maxHeight:
log('Image is too large: the maximum is %d × %d.' \
% (game.maxWidth, game.maxHeight), STOP)
ok = 0
I don’t even need to know whether the image is a
PNG,
JPEG, or
GIF.
(Sunday night.)
Still nothing up for you to see yet, I’m afraid. (Apart
from anything else, I need to ask
my host to install a few
Python packages...) But I do do now have the start of the
second CGI script, the one that accepts reader’s votes for
the current round of pictures. These votes later are used to
decide which picture to use for that panel of the comic strip.
At present the script accepts your vote but does not display
them in any way.
If you vote again, your previous ballot is silently overwritten.
I plan
to support Approval
Voting in future by having a page where you have a checkbox
for each candidate picture and can select as many as you like.
The word ‘your’ is a little misleading; we use
people’s IP addresses as their identifiers, which sort of
works most of the time, but means that people sharing a proxy
server will end up sharing a vote. The alternative (requiring
users to register in order to vote) is not likely to work
because noone will want to register.
Update (Monday night):
The voting form now shows you the pictures with checkboxes.
When you first visit the page, the picture you cloicked on is
ticked, but then you can tick as many more as you like. Because
of the way HTML forms are processed, each form parameter is
potentially a sequence anyway, so the code for each time around
the voting form can be exactly the same. The code that adjusts
the totals is very simple:
def vote(self, uid, pns):
"""Register a vote from the user identified by uid.
uid is an integer, uniquely identifying a voter.
pns is a list of picture numbers
"""
oldPns = self.userVotes.get(uid, [])
if pns == oldPns:
return
for pn in oldPns:
self.pictures[pn].nVotes += -1
for pn in pns:
self.pictures[pn].nVotes += 1
self.userVotes[uid] = pns
The first line retrieves that user’s old ballot, if any.
The first for
statement reverses the effect (if
any) of their former vote, the second counts the new vote.
Finally the ‘ballot’ is saved for later. Behind the
scenes, ZODB
takes care of reading the old data in off disc and
(when the transaction is committed) saving the updated
data.
My paid job involves writing a web application as well, except
this one uses Microsoft ASP .Net linked via ADO .Net to Microsoft SQL
Server® 2000. To do a similar job to the above
snippet, I would be writing two SQL stored procedures (one to retrieve
the exisiting ballot, one to alter the ballot). Invoking a
stored procedure is several more lines of code in the C♯ or VB .Net layer as you create a Command
object, add parameters to it, execute it, and dispose of the
remains. (Or you can create DataSet objects which are even
worse, but have specialized wizards to help you draft the code.)
The actual algorithm (the encoding of the business logic) would
be buried in dozens of lines of boilerplate. By comparison, the Python+ZODB implementation is a
miracle of concision and clarity. The
ZOPE people deserve much kudos.
On my badly broken Linux desktop, the Gimp is missing its
file-saving plug-ins, so it cannot save files except in a format
I cannot use. XPaint does not exist, for some reason. The
venerable bitmap
program does work, but can only
produce X11 bitmap files (which are black and white only). How
then to produce colour icons for my Picky Picky Game mock-ups?
Using PBMPlus to colourize
monochrome bitmaps