40 entries tagged picky

Picky Picky periodicity

My back-burner project for the CAPTION web site is something that I shall give the working name of Picky Picky Game. This game will work in rounds, with everyone able to vote for a picture from this round at the same time as people are submitting pictures for the next round. The idea is that we make a comic strip out of the favourite pictures from each round.

A round might last a week or a month (depending on how often people submit new pictures), so I want to make it a configuration parameter. The code I was working on today was the routine that works out the current round number, given today’s date, the start date, and the period. For example, if we had 2-week rounds and had started on 1 January 2002, then we would today (2002-10-31) be in Round 21.

Picky, Skin, SaxLifter

Found some time to continue work on the Picky Picky Game. I have something which, given a graphics file, writes it in to the correct place in the directory structure. Tonight’s task was a routine for generating the index page, based on the pictures stored so far. In the eventual web application, this routine will be invoked in CGI scripts whenever a new picture is added or vote recorded. For the present I can just run the Python script (one of the ways in which creating web apps in Python is less hassle than, say, ASP .Net or webclasses).

The index page format is mainly controlled through a ‘skin’ file index.skin. This has most of the HTML, with special XML tags for interpolating the dynamic content. This way hopefully Jeremy will be able to hack the HTML without touching any of the application code. (The immediate inspiration for the term skin comes from the Helma Object Publisher system, which does something similar, but using JavaScript.)

The picture metadata is written in XML which is straight­forward enough except that Python’s native SAX support is broken: it does not support XML namespaces! I have fixed this with my own SAX filter dubbed SaxLifter: it processes startElement events by scanning the attributes for namespace prefixes, maintaining a stack of namespace mappings, and generating startElementNS events. Presumably if I were using the XML-SIG or 4Thought enhancements to Python things would work better. Sigh.

The overall strategy is to generate as much static HTML as possible—that is, instead of creating the HTML for the list of pictures afresh each time someone visits the site (which is what PHP and ASP, etc., do), I intend to generate it only when a new picture is added to the list. Since adding pictures will happen much more rarely than viewing the list, this reduces the overall load on the web server. The aim is to use CGI only in the pages that make a change (adding a picture or voting).

Picky Picky Game: Upload pictures, EAGAIN

I have now written a parser for HTML-4.0 file uploads (forms with enctype multipart/form-data). It will need some finessing to get character encodings to work right, but for the simple cases I tried it uploaded files flawlessly, and moreover, plugged in to the back-end script I mentioned in an earlier installment.

Alas! When I tried uploading from Jeremy’s NT box, my Python program crashed with an IOError exception with errno=EAGAIN. I guess I need to do some sort of loop to fill my buffer. Ho hum.

More on Opera’s boundaries

It occurs to me I may be being harsh on Opera; I notice that elsewhere they show a preference for splitting MIME parameters over multiple physical lines. For example, they use

Content-disposition: form-data;
			name="fred"

as opposed to

Content-disposition: form-data; name="fred"

It is just about possible that this confuses thttpd so that it clips everything after the first CRLF when passing the headers to my script via CGI...?

CGI upload woes

On Monday I was troubled by EAGAIN interruptions when reading in a CGI script’s data. It turns out Python has a cgi module already. But when I tried creating a script that used that, it failed to work with Opera’s boundary-less multipart (the built-in cgi module uses the multifile, which I tried and rejected earlier).

I have tried looping until EAGAIN does not happen—but I put a limit of 10 iterations so as not to chew up the CPU. No dice. I have also tried using the fcntl module to remove the O_NONBLOCK flag from stdin. The result is that instead of crashing with EAGAIN it waits indefinitely (and gets interrupted by thttpd’s watchdog timer).

The upshot of this is that I have the beginnings of a CGI script that works if I connect to it from the same machine the server is running on, but not if I connect to it from a different machine (an NT box) on the same network. The thing is, I know that people have successfully written CGI programs in Python, and none of the examples I find on-line have any mention of these phenomena.

EAGAIN, again

I throught I’d try out a different CGI framework, such as jonpy, and this requires Python 2.2. So I have now installed Python 2.2 (carefully installing GNU db, expat, etc. first so these modules will be built). During its self-tests, I noticed that test_socket.py takes 2½ minutes (to do something that should take approximately no time). Come to think of it, initiating connections to my Linux box from Jeremy’s NT box also takes an inordinate amount of time. That might be why initiating HTTP connections to my thttpd instance also takes an inordinate amount of time, so long that thttpd kills the CGI rather than waste any more time. In other words, my CGI problems may mostly stem from a broken network stack. Teriffic. This is a variation on Joel Spolsky’s law of leaky abstractions: I would like to be able to believe in POSIX’s abstraction of sockets as being a lot like a file, but sadly it is all frinked up. Another reason to spend a week or two installing Debian some time.

I think the way forward for now is probably to ignore the network problems and cross my fingers when I install it on the actual server. Given that fairly thorough search of the WWW and Netnews reveals no discussion of the sort of problems I’ve been having, I am fairly sure it is some freakish glitch in my computer...

Picky Picky Game: ZEO + CGI

Even in a toy web application like the Picky Picky Game, it is possible (but unlikely) that two people will want to upload a picture at (nearly) the exact same moment. If two processes try to write the same file at the same time, the results could be a mess. It follows that we need to include something to co-ordinate the changes.

Using ZEO to coordinate CGI scripts

ZEO just works

Converting my non-concurrent code to instead use a persistent store coordinated through ZEO is pretty easy once I’d grokked the documentation. In fact most of the work consisted of deleting some of the routines for just-in-time reading back of the metadata, since that is now taken care of for me by ZODB.

Picky Picky Game: The Joy of PIL

The final piece in the puzzle of my PPG platform is the Python Imaging Library (PIL) from Secret Labs AB (PythonWare). This makes it easy to check that the uploaded images are the right dimensions, for example:

im = Image.open(StringIO.StringIO(data))
width, height = im.size
if width > game.maxWidth or height > game.maxHeight:
    log('Image is too large: the maximum is %d × %d.' \
            % (game.maxWidth, game.maxHeight), STOP)
    ok = 0

I don’t even need to know whether the image is a PNG, JPEG, or GIF.

Picky Picky Game: minimal voting

(Sunday night.) Still nothing up for you to see yet, I’m afraid. (Apart from anything else, I need to ask my host to install a few Python packages...) But I do do now have the start of the second CGI script, the one that accepts reader’s votes for the current round of pictures. These votes later are used to decide which picture to use for that panel of the comic strip.

At present the script accepts your vote but does not display them in any way. If you vote again, your previous ballot is silently overwritten. I plan to support Approval Voting in future by having a page where you have a checkbox for each candidate picture and can select as many as you like.

The word ‘your’ is a little misleading; we use people’s IP addresses as their identifiers, which sort of works most of the time, but means that people sharing a proxy server will end up sharing a vote. The alternative (requiring users to register in order to vote) is not likely to work because noone will want to register.

Update (Monday night): The voting form now shows you the pictures with checkboxes. When you first visit the page, the picture you cloicked on is ticked, but then you can tick as many more as you like. Because of the way HTML forms are processed, each form parameter is potentially a sequence anyway, so the code for each time around the voting form can be exactly the same. The code that adjusts the totals is very simple:

def vote(self, uid, pns):
    """Register a vote from the user identified by uid.

    uid is an integer, uniquely identifying a voter.
    pns is a list of picture numbers
    """
    oldPns = self.userVotes.get(uid, [])
    if pns == oldPns:
        return
    for pn in oldPns:
        self.pictures[pn].nVotes += -1
    for pn in pns:
        self.pictures[pn].nVotes += 1
    self.userVotes[uid] = pns

The first line retrieves that user’s old ballot, if any. The first for statement reverses the effect (if any) of their former vote, the second counts the new vote. Finally the ‘ballot’ is saved for later. Behind the scenes, ZODB takes care of reading the old data in off disc and (when the transaction is committed) saving the updated data.

My paid job involves writing a web application as well, except this one uses Microsoft ASP .Net linked via ADO .Net to Microsoft SQL Server® 2000. To do a similar job to the above snippet, I would be writing two SQL stored procedures (one to retrieve the exisiting ballot, one to alter the ballot). Invoking a stored procedure is several more lines of code in the C♯ or VB .Net layer as you create a Command object, add parameters to it, execute it, and dispose of the remains. (Or you can create DataSet objects which are even worse, but have specialized wizards to help you draft the code.) The actual algorithm (the encoding of the business logic) would be buried in dozens of lines of boilerplate. By comparison, the Python+ZODB implementation is a miracle of concision and clarity. The ZOPE people deserve much kudos.

Picky Picky Game: Tidying up the Python code

Having got the first working version of the Picky Picky Game, I have naturally now pulled it apart again. I decided that now it is in a state where it makes sense to try to package up a version for Adrian to try installing, I better think about getting the module and package names right, since it will be harder to change them later.

I have reorganized my Python classes in to their own package pdc (designed to prevent name collisions with WWW-oriented packages by other people). I also changed some of the file names—so that ‘import httputils’ becomes ‘from pdc import www’.

There is now a proper unit-test suite for the www module (which has functions like urlDecode, urlResolve, and xmlencode). This is easier to do for this odule than the others, which tend to involve creating scads of HTML text which will be hard to check for errors. For the URL-manipulating functions, the unit tests turned out to be invaluable—there are a lot of corner cases that I only sorted out because I had tests for all of them.

PPG: Testing time

Spent another dollop of time on the Picky Picky Game, moving code in to my www module. This is a slow process because I have decided that this module is going to have a complete set of unit tests (wwwtests.py). Writing tests after the fact can be a little depressing—it keeps digging up bugs (i.e., mistakes) you made when writing the function in the first place. The Extreme Programming gurus say this is why writing the tests first is psychologically important (as well as being important to their methodology) because you end the day successfully passing tests that failed earlier in the day, rather than not depressingly writing tests that either never get triggered or which show up flaws in your precious code.

Anyway, the upshot of today’s not particularly intensive work is that it does more or less what it did last week, but is maybe a bit more reliable than it was before.

Picky Picky Game: Refactoring the CGI scripts

In the process of getting the CGI scripts to work again after reorganizing the libraries, I have further refactored them my moving the request-processing code in to the Picky Picky Game library, leaving the CGI script to contain just a few configuration parameters and an invocation of the library routine. This style is not unlike that used by Joe Gregorio in his Well-Formed Web experiment.

There are various advantages to moving most of the code out of the CGIs themsevles—better information hiding, for example (which can be considered a security feature as well as good programming practice). It also allows the bulk of the code to be stored in byte-compiled form on disc, which might make processing requests a little faster.

Picky Picky Game release 0.1

I have created an experimental, pre-alpha, test-of-concept, categorically not finished or complete package that is a snapshot of the Picky Picky Game development so far (picky-0.1.tgz, picky01.zip). An enthusiastic web master with a Python-compatible server should be able to install this and make it go. What’s more, if this version can be installed, then future versions should also be installable (since I don’t intend to require any additional features). But don’t hold me to that... Your mileage may vary.

There is one important unresolved issue, the ‘EAGAIN’ problem, which I decided to put to one side for now. I hope to use this snapshot to test the problem in different environments.

Picky + REST?

Does I make sense to rework the URLs for the Picky Picky Game so they more resemble REST-inspired interfaces like the Well-Formed Web experiment? I have belatedly returned to the project and started by creating a self-contained server version (as opposed to the CGI-based one created earlier). This gives me more control over the URL scheme used for the (proposed) site.

More irrelevant musings

Update (18 March 2003). Really what I am talking about here is more to do with cool URLs (to use the term coined by Tim Berners-Lee) than REST per se.

Picky Picky Game: HTML 4 versus REST

REST is largely based on using HTTP as originally designed—which includes respecting the intended semantics of the methods GET and POST (basically, requests that add or change things should use POST and not GET; requests that view information without altering it should use GET, not POST). A flaw in the HTML 4 definition makes this annoyingly difficult.

Serving my own damn files

I am rethinking my original plan for the Picky Picky Game, which was to store resoures in files as often as possible. For example, index.html is a static file (not dynamically generated every time someone visits it). This requires that when something happens that means index.html should change, this file has to be updated.

Pros and cons

Picky Picky Game: No writing files

As mentioned earlier, I am working on a version of the Picky Picky Game that does not need to write to files (since some web servers will be set up that way for security reasons). In some ways this simplifies things, because it means there is only one URL hierarchy to worry about.

URL design

Backslashes in URLs

Had a bug in the Picky Picky Game where uploaded pictures might have backslashes left in their names (the picture name being derived from the file name supplied by the client computer). Technically it is OK to have backslashes in a URL, and they should be treated like any other character. Some web browsers second-guess you, however, and replace backslashes with slashes (http://foo/bar\baz is treated as http://foo/bar/baz), with the result that these pictures failed to appear.

The solution is, of course, to (a) change the code for translating file names in to picture names so that it removes backslashes, and (b) fix the existing databases. ZODB makes the second part pretty easy; having acquired a Game instance from the databse, you just run a script like

for rn, r in game.rounds.items():
    for pic in r.pictures:
	s = r.sanitizedName(pic.name, pic)
	if s != pic.name:
	    pic.name = s
	    pic.dataUri = s + picky.mediaTypeSuffix(pic.mediaType)

The function sanitizedName is the one that has to be fixed for part (a).

CGI vs. If-Modified-Since (part 2)

Luckily, Apache honours the Status pseudo-header, and so my my Picky Picky Game CGI script can issue response code 304 and it works. Yay. My legions of testers report that the downloading goes more smoothly now.

The finished panels now also have text below them saying how many candidate panels were uploaded, and how many votes were recorded. The tricky bit was making sure it says ‘votes’ when there are 0 or more than 1 votes, and ‘vote’ when there is only one.

The dynamic HTML is generating using a template file that I have whimsically named a skin. This is a more-or-less XML file (meaning that I intend it to be well-formed XML, but actually process it as plain text). Mostly it is XHTML, but it can include elements in special namespaces like <p:version/>. These are replaced with text generated by Python functions or strings (e.g., <p:version/> is replaced by something like 0.9 <pdc 2003-05-19>). The comics panel at the left of the index page is produced with the following incantation:

<p:panel subtract="3" skin="index-panel"/>

This says to render the panel whose number is 3 less than the current panel (so if the current panel is #20, this shows #17), using the skin contained in the file index-panel.skin. This allows the hypothetical graphic designer of the Picky Picky Game considerable latitude in how panels are displayed.

While rendering index-panel.skin, it comes across this fragment:

<p class="detail">
    [<p:pictureCount singular="candidate"/>,
    <p:voteCount singular="vote"/>.]
</p>

The p:pictureCount element sort of inherits the subtract attribute of the original p:panel, because it gives the number of pictures in panel #17 (as opposed to the current panel). The singular and plural attributes (if provided) specify text to follow the number (in the absence of an explicit plural attribute, it adds s to the singular).

Simplicity itself.

Update (2003-05-20): I have corrected a typo the above URL. Sorry for any inconvenience.

CGI vs. If-Modified-Since and other stories

I demonstrated the Picky Picky Game prototype to the CAPTION committee. The main trouble was picture resources not being downloaded, or, oddly, vanishing when one refreshed the page. It worked better on the dial-up than the broadband connection (though Jo blames that on IE 6 being set up to cache nothing). I resolved to make an effort to sort out caching—or at least, the things my server needs to do to enable caching to work smoothly.

So, in my development system at home, I added If-Modified-Since and If-None-Match (etag) support to the routine that fetches picture data out of the database. I also added an Expires header set, as RFC 2616 demands, approximately one year in to the future. Result: none of the pictures appear.

The problem is that the web server I am using at home always returns status code 200 for CGI scripts (it ignores the Status pseudo-header). As a result, my clever 304 (‘Not modified’) responses result in apparently zero-length data. Argh!

When I worked this out, I though I would demonstrate to Jeremy that it worked in the stand-alone server (which does not use CGI). But Lo! all the pictures failed to appear once more. So did the page itself. What gives?

This time the trouble was its logging function—it tried to resolve the client IP address. Now, I thought that the address used by my PowerBook did have reverse look-up in my local DNS, but in any case, the server should not be indulging in DNS look-ups given that on my system that is a blocking operation that tends to mean the program locks up for 75 seconds. Luckily BaseHTTPServer makes it easy to override the function that indulges in DNS queries and it all now runs smoothly.

On the positive side, I have made one cache-enhancing change that has worked, albeit only for the old panels (which saved their images as separate disc files rather than in ZODB). Simply put, there is another base URL used (in addition to the base of the web application and the base URL for static files), and this one is for picture files. This means that these old pictures are now, once again, served as static files, with etags and caching the responsibility of my host HTTP server, not me.

On to Panel 17!

I have taken the liberty of bumping the Picky Picky Game forward to the next panel. I have also cranked the speed up a notch: each ‘week’ will be 3 days for the next little while.

The idea is to let us test the various behind-the-scenes mechanisms of the game so that we can risk mentioning it to people at COMICS 2003 in Bristol this bank-holiday weekend.

The CSS has been tweaked to not use auto margins, because they are not supported in MSIE.

Now with BMP support

To make it easier to create pictures for the Picky Picky Game, I have added code to automatically convert Microsoft Windows Bitmap (.bmp) files into PNGs.

This is straightforward because I already was using PIL to check the dimensions of the images. Converting to PNG (if the format is one PIL knows) is pretty simple:

permittedTypes = ['image/png', 'image/jpeg', 'image/pjpeg', 'image/gif']
...
if not imt in permittedTypes:
    buf = StringIO.StringIO()
    im.save(buf, 'PNG')
    data = buf.getvalue()
    logger.log(slog.WARNING, 'Converted your image from %s to image/png' % imt,
	       'This may have lead to a slight loss of image quality.')
    imt = 'image/png'
    buf.close()

The above goes in the sequence of checks on uploaded images (after the check for width × height, but before the check for number of bytes). I think I spent longer creating a BMP image to test it on than I did writing the new code!

The advantage of BMP support is that, if you have Microsoft Windows, then you definitely have Microsoft Paint installed. So long as you know about Start menu → Programs → Accessories → Paint, and the Image → Attributes menu item, you can create panels for Picky Picky Game.

Fixed a JavaScript problem, one remains

Before going in to work today I have managed to fix one of the JavaScript problems (it causes MSIE to report ‘one error on page’), but only half-fixed the other (which causes artists’ names with links to vanish when you cycle through the panels). In the latter case, the name no longer vanishes, but, alas! the link does.

I think need a JavaScript debugger—in other words, to install Mozilla on my PowerBook (the only computer I own with enough welly for Mozilla). Ho hum.

Back to your regular schedule

I have returned the Picky Picky Game to its weekly schedule. The reason for going at double speed was to fill up the home page with non-gash pictures, and this has been achieved. So you have until Thursday night to upload a candidate for panel 19.

We just got back from COMICS 2003, which was fun. We mentioned Picky Picky to as many people as we could, so either the server will be overwhelmed with activity, or no-one will bother clicking though and we will be miserably ignored. Who knows what the future holds?

The journey back was a disaster—approximately 4½ hours (mostly spent waiting in cold drafty stations). A nasty combination of reduced Oxford–Bristol service, engineering works and football game crowds made the train journeys particularly unpleasant—at when there was a substitute bus we got to sit down...

Using Microsoft Paint to draw a panel

I have written a short note on how to use MSPAINT to draw a Picky Picky Game panel. The advantage of MSPAINT is that it is available on all Microsoft Windows computers, even ones not set up for image enditing.

Much to my suprise, there is no longer a drawing prgoram bundled with all Apple Macs—my PowerBook 12″ is without a drawing program.

Remember my details

I have added JavaScript to the upload form Picky Picky Game on caption.org to optionally remember your details for next time (using a cookie). This way you don’t have to enter your URL each time you upload a new panel.

Debugging JavaScript without a JavaScript debugger is a real pain in the arse, and illustrates how subtle aspects of language design affect the experience of working in that language. There is one crucial difference between Python and JavaScript. In Python, a variable is implicitly created the first time you assign to it; in JavaScript, it is created the first time you refer to it. This means that the following fragment is valid JavaScript:

var cookieHeader = this.$document.cookie;
var m = myRegexp.exec(cookiesHeader);
if (m) {
    ... use the match info to process the cookies ...
}

The equivalent Python looks like this:

cookieHeader = self._document.cookie
m = myRegexp.search(cookiesHeader)
if m:
    ... use the match info to process the cookies ...

In the JavaScript version, the regexp (used to extract one cookie from the Cookies header) will mysteriously never match and you will spend ages scrutinizing the regexp and flipping though the documentation on what is and is not valid regexp syntax in JavaScript. In Python you will get an error message telling you that the variable cookiesHeader is referred to before it is assigned to—and immediately realise its name is misspelled in the second line.

The tedious thing about testing the ‘remember me’ option is that it involves repeatedly doing the very thing it is supposed to be saving me from: entering my URL and details on the picture-upload form. Luckily I was testing on Safari, which has a form auto-completion feature that makes repeatedly filling in the form less annoying—but which also makes the ‘Remember me’ feature almost entirely redundant ;-)

Now with comments!

I have added a simple comment system to the Picky Picky Game. Go me!

There are all sorts of design considerations when it comes to on-line comments. I was aiming at simplicity so it has no branching (threading), no HTML ... no nothing, basically. URLs in your posts get magically turned in to links, and blank lines become paragraph breaks, but that is about all. There is one discussion page per panel.

Fixed a bug when starting new panel

The Picky Picky Game prototype stopped working this morning because of a bug in the code that sets up a new panel (and this morning is when the new round begins). I was able to patch the running version from work during lunch, and have now done a proper fix on my development system at home. Read more

JavaScript image loading problems

The image-cycling feature of the Picky Picky Game prototype depends on using JavaScript to load images. If you click on the Cycle button before the images have been prloaded, then nothing visible happens—it appears to have failed. There is no way for the user to see whether the images have loaded or not. I have attempted to add such an indication, only to be thwarted by what appear to be bugs in the web browsers I have tried it on. Read more

New layout for archive pages

I have fiddled with the layout of the archive pages of the Picky Picky Game so that (a) it lays out all the panels in one row, rather than wrapping to multiple tiers per page, andf (b) it uses an HTML table to achieve the layout. This should remove the occasional glitch where cycling panels caused the layout to change. Read more

Picky Picky Disaster

Yesterday the server hosting our Picky Picky Game suffered a hardware failure, and for some reason we have lost all additions to the object database since it went live. After discussion on LiveJournal, enough people found pictures cached in their web browsers that I have been able to reconstruct an archive of the completed panels (with only a couple of lacunae), and so we plan to start the game afresh, with the output of the previous game inserted as Page 0 of its archive. Read more

Dates Dates Dates

Once again the Picky Picky Game had problems calculating dates. Alas! that the Python-2.3 datetime module arived too late to carry this burden on my behalf. This time it was not month #0 that caused problems, but, predictably perhaps, month #13. Feh. Read more

TurboGears and OpenID, part 2

I wrote that I planned to return to my TurboGears version of the Picky Picky Game once I had picture-uploading working. Yesterday I finally had a spare afternoon to do some more hacking on Picky2, and got uploading of pictures working. But before that I will finish off my description of doing authentication with OpenID. Read more