40 entries tagged
picky
My back-burner project for the CAPTION web site is something
that I shall give the working name of Picky Picky Game.
This game will work in rounds, with everyone able to vote for
a picture from this round at the same time as people are
submitting pictures for the next round.
The idea is that we make a comic strip out of
the favourite pictures from each round.
A round might last a week or a month (depending on how often
people submit new pictures), so I want to make it a
configuration parameter. The code I was working on today
was the routine that works out the current round number, given
today’s date, the start date, and the period. For example,
if we had 2-week rounds and had started on 1 January 2002, then
we would today (2002-10-31) be in Round 21.
Found some time to continue work on
the Picky Picky Game. I have something which, given a
graphics file, writes it in to the correct place in the
directory structure. Tonight’s task was a routine for
generating the index page, based on the pictures stored so far.
In the eventual web application, this routine will be invoked in
CGI scripts
whenever a new picture is added or vote recorded. For the
present I can just run the Python script (one of the ways
in which creating web apps in Python is less hassle than, say,
ASP .Net or webclasses).
The index page format is mainly controlled through a
‘skin’ file index.skin
. This has most
of the HTML, with special XML tags for interpolating the dynamic
content. This way hopefully Jeremy will be able to hack the
HTML without touching any of the application code. (The
immediate inspiration for the term skin comes from the
Helma Object Publisher system,
which does something similar, but using JavaScript.)
The picture metadata is written in XML which is straightforward
enough except that Python’s native
SAX support is broken:
it does not support XML namespaces! I have fixed this
with my own SAX filter
dubbed SaxLifter
: it processes
startElement
events by scanning the attributes for
namespace prefixes, maintaining a stack of namespace mappings,
and generating startElementNS
events. Presumably
if I were using the XML-SIG or 4Thought enhancements to
Python things would work better. Sigh.
The overall strategy is to generate as much static HTML as
possible—that is, instead of creating the HTML for the
list of pictures afresh each time someone visits the site (which
is what PHP and ASP, etc., do), I intend to generate it
only when a new picture is added to the list. Since adding
pictures will happen much more rarely than viewing the list,
this reduces the overall load on the web server. The aim is to
use CGI only in
the pages that make a change (adding a picture or voting).
I have now written a parser for HTML-4.0 file uploads (forms
with enctype multipart/form-data
). It will need
some finessing to get character encodings to work right, but for
the simple cases I tried it uploaded files flawlessly, and
moreover, plugged in to the back-end script I mentioned in
an earlier installment.
Alas! When I tried uploading from Jeremy’s NT box, my
Python program crashed with an IOError
exception
with
errno=EAGAIN
. I guess I need to do some
sort of loop to fill my buffer. Ho hum.
It occurs to me I may be being harsh on Opera;
I notice that elsewhere they show a preference for
splitting MIME parameters over multiple physical lines. For
example, they use
Content-disposition: form-data;
name="fred"
as opposed to
Content-disposition: form-data; name="fred"
It is just about possible that this confuses thttpd
so that it clips everything after the first CRLF when passing
the headers to my script via CGI...?
On Monday I was troubled by
EAGAIN
interruptions when reading in a CGI
script’s data. It turns out Python has a cgi
module already. But when I tried creating a script that
used that, it failed to work with Opera’s boundary-less
multipart (the built-in cgi
module uses the
multifile
, which I tried and rejected
earlier).
I have tried looping until EAGAIN
does not
happen—but I put a limit of 10 iterations so as not
to chew up the CPU. No dice. I have also tried using the
fcntl
module to remove the O_NONBLOCK
flag from stdin. The result is that instead of crashing with
EAGAIN
it waits indefinitely (and gets interrupted
by thttpd
’s watchdog timer).
The upshot of this is that I have the beginnings of a CGI
script that works if I connect to it from the same machine the
server is running on, but not if I connect to it from a
different machine (an NT box) on the same network.
The thing is, I know that people have successfully
written CGI programs in Python, and none of the examples
I find on-line have any mention of these phenomena.
I throught I’d try out a different CGI framework, such
as jonpy, and this
requires Python 2.2. So I have now installed
Python 2.2 (carefully installing GNU db, expat, etc. first
so these modules will be built). During its self-tests,
I noticed that test_socket.py
takes 2½
minutes (to do something that should take approximately no
time). Come to think of it, initiating connections to my Linux
box from Jeremy’s NT box also takes an inordinate amount of time. That
might be why initiating HTTP connections to my
thttpd
instance also takes an inordinate amount of
time, so long that thttpd
kills the CGI rather than
waste any more time. In other words, my CGI problems may mostly
stem from a broken network stack. Teriffic. This is a
variation on Joel
Spolsky’s law of leaky abstractions: I would like to
be able to believe in POSIX’s abstraction of sockets as
being a lot like a file, but sadly it is all frinked up.
Another reason to spend a week or two installing Debian some time.
I think the way forward for now is probably to ignore the
network problems and cross my fingers when I install it on
the actual server. Given that fairly thorough search of the WWW
and Netnews reveals no discussion of the sort of problems
I’ve been having, I am fairly sure it is some
freakish glitch in my computer...
Even in a toy web application like the Picky Picky Game, it is
possible (but unlikely) that two people will want to upload a
picture at (nearly) the exact same moment. If two processes try
to write the same file at the same time, the results could be a
mess. It follows that we need to include something to
co-ordinate the changes.
Using ZEO to coordinate CGI scripts
Converting my non-concurrent code to instead use a persistent
store coordinated through ZEO is pretty easy once I’d grokked the
documentation. In fact most of the work consisted of deleting
some of the routines for just-in-time reading back of the
metadata, since that is now taken care of for me by ZODB.
The final piece in the puzzle of my PPG platform is the Python
Imaging Library (PIL) from Secret Labs AB
(PythonWare). This makes it easy to check that the uploaded
images are the right dimensions, for example:
im = Image.open(StringIO.StringIO(data))
width, height = im.size
if width > game.maxWidth or height > game.maxHeight:
log('Image is too large: the maximum is %d × %d.' \
% (game.maxWidth, game.maxHeight), STOP)
ok = 0
I don’t even need to know whether the image is a
PNG,
JPEG, or
GIF.
(Sunday night.)
Still nothing up for you to see yet, I’m afraid. (Apart
from anything else, I need to ask
my host to install a few
Python packages...) But I do do now have the start of the
second CGI script, the one that accepts reader’s votes for
the current round of pictures. These votes later are used to
decide which picture to use for that panel of the comic strip.
At present the script accepts your vote but does not display
them in any way.
If you vote again, your previous ballot is silently overwritten.
I plan
to support Approval
Voting in future by having a page where you have a checkbox
for each candidate picture and can select as many as you like.
The word ‘your’ is a little misleading; we use
people’s IP addresses as their identifiers, which sort of
works most of the time, but means that people sharing a proxy
server will end up sharing a vote. The alternative (requiring
users to register in order to vote) is not likely to work
because noone will want to register.
Update (Monday night):
The voting form now shows you the pictures with checkboxes.
When you first visit the page, the picture you cloicked on is
ticked, but then you can tick as many more as you like. Because
of the way HTML forms are processed, each form parameter is
potentially a sequence anyway, so the code for each time around
the voting form can be exactly the same. The code that adjusts
the totals is very simple:
def vote(self, uid, pns):
"""Register a vote from the user identified by uid.
uid is an integer, uniquely identifying a voter.
pns is a list of picture numbers
"""
oldPns = self.userVotes.get(uid, [])
if pns == oldPns:
return
for pn in oldPns:
self.pictures[pn].nVotes += -1
for pn in pns:
self.pictures[pn].nVotes += 1
self.userVotes[uid] = pns
The first line retrieves that user’s old ballot, if any.
The first for
statement reverses the effect (if
any) of their former vote, the second counts the new vote.
Finally the ‘ballot’ is saved for later. Behind the
scenes, ZODB
takes care of reading the old data in off disc and
(when the transaction is committed) saving the updated
data.
My paid job involves writing a web application as well, except
this one uses Microsoft ASP .Net linked via ADO .Net to Microsoft SQL
Server® 2000. To do a similar job to the above
snippet, I would be writing two SQL stored procedures (one to retrieve
the exisiting ballot, one to alter the ballot). Invoking a
stored procedure is several more lines of code in the C♯ or VB .Net layer as you create a Command
object, add parameters to it, execute it, and dispose of the
remains. (Or you can create DataSet objects which are even
worse, but have specialized wizards to help you draft the code.)
The actual algorithm (the encoding of the business logic) would
be buried in dozens of lines of boilerplate. By comparison, the Python+ZODB implementation is a
miracle of concision and clarity. The
ZOPE people deserve much kudos.
Having got the first working version of the Picky Picky Game,
I have naturally now pulled it apart again. I decided
that now it is in a state where it makes sense to try to
package up a version for Adrian to try installing, I better
think about getting the module and package names right, since it
will be harder to change them later.
I have reorganized my Python classes in to their own
package pdc
(designed to prevent name collisions
with WWW-oriented packages by other people). I also
changed some of the file names—so that ‘import
httputils
’ becomes ‘from pdc import
www
’.
There is now a proper
unit-test
suite for the www
module (which has functions like urlDecode
,
urlResolve
, and xmlencode
). This is
easier to do for this odule than the others, which tend to
involve creating scads of HTML text which will be hard to check
for errors. For the URL-manipulating functions, the unit tests
turned out to be invaluable—there are a lot of corner
cases that I only sorted out because I had tests for
all of them.
Spent another dollop of time on the Picky Picky Game, moving
code in to my www
module. This is a slow process
because I have decided that this module is going to have a
complete set of unit tests (wwwtests.py
). Writing
tests after the fact can be a little depressing—it keeps
digging up bugs (i.e., mistakes) you made when writing the
function in the first place. The Extreme Programming gurus say
this is why writing the
tests first is psychologically important (as well as being
important to their methodology) because you end the day
successfully passing tests that failed earlier in the day,
rather than not depressingly writing tests that either never get
triggered or which show up flaws in your precious code.
Anyway, the upshot of today’s not particularly intensive
work is that it does more or less what it did last week, but is
maybe a bit more reliable than it was before.
In the process of getting the CGI scripts to work again after
reorganizing the libraries, I have further refactored them my
moving the request-processing code in to the Picky Picky Game
library, leaving the CGI script to contain just a few
configuration parameters and an invocation of the library
routine. This style is not unlike that used by Joe Gregorio in
his Well-Formed
Web experiment.
There are various advantages to moving most
of the code out of the CGIs themsevles—better information
hiding, for example (which can be considered a security feature
as well as good programming practice). It also allows the bulk
of the code to be stored in byte-compiled form on disc, which
might make processing requests a little faster.
I have created an experimental, pre-alpha, test-of-concept,
categorically not finished or complete package that is a
snapshot of the Picky Picky Game development so far (picky-0.1.tgz
, picky01.zip
). An
enthusiastic web master with a Python-compatible server
should be able to install this and make it go.
What’s more, if this version can be installed, then future
versions should also be installable (since I don’t
intend to require any additional features). But don’t
hold me to that... Your mileage may vary.
There is one important unresolved issue, the ‘EAGAIN’ problem,
which I decided to put to one side
for now. I hope to use this snapshot to test the
problem in different environments.
Does I make sense to rework the URLs for the Picky Picky Game so
they more resemble
REST-inspired
interfaces like the Well-Formed Web
experiment? I have belatedly returned to the project and
started by creating a self-contained server version (as opposed
to the CGI-based one created earlier).
This gives me more control over the URL scheme used for the
(proposed) site.
More irrelevant musings
Update (18 March 2003). Really what I am
talking about here is more to do with cool URLs
(to use the term coined by Tim Berners-Lee) than REST per se.
REST is
largely based on using HTTP as originally designed—which
includes respecting the intended semantics of the methods
GET
and POST
(basically, requests that
add or change things should use POST
and not
GET
; requests that view information without
altering it should use GET
, not POST
).
A flaw in the HTML 4 definition makes this
annoyingly
difficult.
So far getting uploads to work with the
Picky Picky Game is an uphill
struggle. Some web browsers work and some do not, and it is
going to require some fancy diagnostic tools to find out why
not.
—More (10%)—
I am rethinking my original plan for the Picky Picky Game, which was to store
resoures in files as often as possible. For example,
index.html
is a static file (not dynamically
generated every time someone visits it). This requires that
when something happens that means index.html
should
change, this file has to be updated.
Pros and cons
As mentioned earlier, I am working on a version of the Picky Picky Game that does not need to
write to files (since some web servers will be set up that way
for security reasons). In some ways this simplifies things,
because it means there is only one URL hierarchy to worry about.
URL design
I have converted the Picky Picky Game so that it can run as a
BaseHTTPServer
server as an alternative to as a
CGI script, in
order to avoid the performance penalty caused by starting up
fresh Python processes for each request. But I discovered
that the sevrer would lock up from time to time.
Closing your connections
Had a bug in the Picky Picky Game where
uploaded pictures might have backslashes left in their names
(the picture name being derived from the file name supplied by
the client computer). Technically it is OK to have backslashes
in a
URL, and they should
be treated like any other character. Some web browsers
second-guess you, however, and replace backslashes with slashes
(http://foo/bar\baz
is treated as
http://foo/bar/baz
), with the result that these
pictures failed to appear.
The solution is, of course, to (a) change the code for
translating file names in to picture names so that it removes
backslashes, and (b) fix the existing databases.
ZODB makes the second part pretty easy; having acquired a
Game
instance from the databse, you just run a
script like
for rn, r in game.rounds.items():
for pic in r.pictures:
s = r.sanitizedName(pic.name, pic)
if s != pic.name:
pic.name = s
pic.dataUri = s + picky.mediaTypeSuffix(pic.mediaType)
The function sanitizedName
is the one that has to
be fixed for part (a).
Luckily, Apache honours the Status
pseudo-header,
and so my my
Picky Picky Game CGI script can issue response code 304 and it works. Yay. My
legions of testers report that the downloading goes more
smoothly now.
The finished panels now also have text below them saying how
many candidate panels were uploaded, and how many votes were
recorded. The tricky bit was making sure it says
‘votes’ when there are 0 or more than 1 votes, and
‘vote’ when there is only one.
The dynamic HTML
is generating using a template file that I have whimsically
named a skin. This is a more-or-less XML file (meaning
that I intend it to be well-formed XML, but actually
process it as plain text). Mostly it is XHTML, but it can
include elements in special namespaces like
<p:version/>
. These are replaced with text
generated by Python functions or strings (e.g.,
<p:version/>
is replaced by something like
0.9 <pdc 2003-05-19>
). The comics panel at the left
of the index page is produced with the following incantation:
<p:panel subtract="3" skin="index-panel"/>
This says to render the panel whose number is 3 less than the
current panel (so if the current panel is #20, this shows #17),
using the skin contained in the file
index-panel.skin
. This allows the hypothetical
graphic designer of the Picky Picky Game considerable latitude
in how panels are displayed.
While rendering index-panel.skin
, it comes across
this fragment:
<p class="detail">
[<p:pictureCount singular="candidate"/>,
<p:voteCount singular="vote"/>.]
</p>
The p:pictureCount
element sort of inherits the
subtract
attribute of the original
p:panel
, because it gives the number of pictures in
panel #17 (as opposed to the current panel). The
singular
and plural
attributes (if
provided) specify text to follow the number (in the absence of
an explicit plural
attribute, it adds
s
to the singular).
Simplicity itself.
Update (2003-05-20):
I have corrected a typo the above URL. Sorry
for any inconvenience.
I demonstrated the
Picky Picky Game prototype
to the
CAPTION committee. The
main trouble was picture resources not being
downloaded, or, oddly, vanishing when one refreshed the
page. It worked better on
the dial-up than the broadband connection (though Jo
blames that on
IE 6
being set up to cache nothing).
I resolved to make an effort to sort out
caching—or at least, the things my server needs to
do to enable caching to work smoothly.
So, in my development system at home,
I added If-Modified-Since
and
If-None-Match
(etag) support to the routine
that fetches picture data out of the database. I also
added an Expires
header set, as
RFC 2616
demands, approximately one year in to the future.
Result: none of the pictures appear.
The problem is
that the web server I am using at home always
returns status code 200 for CGI scripts (it ignores the
Status
pseudo-header). As a result, my clever
304 (‘Not modified’) responses result in
apparently zero-length data. Argh!
When I worked this out, I though I would
demonstrate to Jeremy that it worked in the stand-alone server (which
does not use CGI). But Lo! all the pictures failed to
appear once more. So did the page itself. What gives?
This time the trouble was its logging
function—it tried to resolve the client IP
address. Now, I thought that the address used by my
PowerBook did have reverse look-up in my local
DNS, but
in any case, the server should not be indulging in DNS
look-ups given that on my system that is a blocking
operation that tends to mean the program locks up for 75
seconds. Luckily BaseHTTPServer
makes it
easy to override the function that indulges in DNS
queries and it all now runs smoothly.
On the positive side, I have made one cache-enhancing
change that has worked, albeit only for the old panels
(which saved their images as separate disc files rather
than in ZODB).
Simply put, there is another base URL
used (in addition to the base of the web application and
the base URL for static files), and this one is for
picture files. This means that these old pictures are
now, once again, served as static files, with etags
and caching the responsibility of my host
HTTP server,
not me.
I have taken the liberty of bumping the Picky
Picky Game forward to the next panel. I have also cranked
the speed up a notch: each ‘week’ will be 3 days for
the next little while.
The idea is to let us test the various behind-the-scenes
mechanisms of the game so that we can risk mentioning it to
people at COMICS
2003 in Bristol this bank-holiday weekend.
The CSS has been
tweaked to not use auto
margins, because they are
not supported in MSIE.
To make it easier to create pictures for the Picky
Picky Game, I have added code to automatically convert
Microsoft Windows Bitmap (.bmp
) files into
PNGs.
This is straightforward because I already was using PIL to check the
dimensions of the images. Converting to PNG (if the format is
one PIL knows) is pretty simple:
permittedTypes = ['image/png', 'image/jpeg', 'image/pjpeg', 'image/gif']
...
if not imt in permittedTypes:
buf = StringIO.StringIO()
im.save(buf, 'PNG')
data = buf.getvalue()
logger.log(slog.WARNING, 'Converted your image from %s to image/png' % imt,
'This may have lead to a slight loss of image quality.')
imt = 'image/png'
buf.close()
The above goes in the sequence of checks on uploaded images
(after the check for width × height, but before the check
for number of bytes). I think I spent longer creating
a BMP image to test it on than I did writing the new code!
The advantage of BMP support is that, if you have Microsoft
Windows, then you definitely have Microsoft Paint installed.
So long as you know about Start menu → Programs →
Accessories → Paint, and the Image → Attributes menu item,
you can create panels for Picky Picky Game.
Before going in to work today I have managed to fix one of the
JavaScript problems (it causes MSIE to report ‘one error
on page’), but only half-fixed the other (which causes
artists’ names with links to vanish when you cycle through
the panels). In the latter case, the name no longer vanishes,
but, alas! the link does.
I think need a JavaScript
debugger—in other words, to install Mozilla on my
PowerBook (the only computer I own with enough welly for
Mozilla). Ho hum.
I have returned the Picky Picky Game to its
weekly schedule. The reason for going at double speed was to
fill up the home page with non-gash pictures, and this has been
achieved. So you have until Thursday night to upload a
candidate for panel 19.
We just got back from COMICS 2003, which was fun. We mentioned
Picky Picky to as many people as we could, so either the server
will be overwhelmed with activity, or no-one will bother
clicking though and we will be miserably ignored. Who knows
what the future holds?
The journey
back was a disaster—approximately 4½ hours (mostly
spent waiting in cold drafty stations). A nasty combination of
reduced Oxford–Bristol service, engineering works and
football game crowds made the train journeys particularly
unpleasant—at when there was a substitute bus we got to
sit down...
I have written a short note on how
to use MSPAINT to draw a Picky Picky Game panel.
The advantage of MSPAINT is that it is available on all
Microsoft Windows computers, even ones not set up for image
enditing.
Much to my suprise, there is no longer a drawing prgoram bundled
with all Apple Macs—my PowerBook 12″ is without a drawing
program.
I have added JavaScript to the upload form Picky Picky Game on
caption.org to optionally remember your details for next
time (using a cookie). This way you don’t have to enter
your URL each time you upload a new panel.
Debugging JavaScript without a JavaScript debugger is a real
pain in the arse, and illustrates how subtle aspects of language
design affect the experience of working in that language. There
is one crucial difference between Python and JavaScript. In
Python, a variable is implicitly created the first time you
assign to it; in JavaScript, it is created the first time you
refer to it. This means that the following fragment is valid
JavaScript:
var cookieHeader = this.$document.cookie;
var m = myRegexp.exec(cookiesHeader);
if (m) {
... use the match info to process the cookies ...
}
The equivalent Python looks like this:
cookieHeader = self._document.cookie
m = myRegexp.search(cookiesHeader)
if m:
... use the match info to process the cookies ...
In the JavaScript version, the regexp (used to extract one
cookie from the Cookies
header) will mysteriously never match
and you will spend ages scrutinizing the regexp and flipping
though the documentation on what is and is not valid regexp
syntax in JavaScript. In Python you will get an error message
telling you that the variable cookiesHeader
is
referred to before it is assigned to—and immediately
realise its name is misspelled in the second line.
The tedious thing about testing the ‘remember me’
option is that it involves repeatedly doing the very thing it is
supposed to be saving me from: entering my URL and details
on the picture-upload form. Luckily I was testing on Safari,
which has a form auto-completion feature that makes repeatedly
filling in the form less annoying—but which also makes the
‘Remember me’ feature almost entirely redundant
;-)
I have added a simple comment system to the Picky Picky Game. Go me!
There are all sorts of design considerations when it comes to
on-line comments. I was aiming at simplicity so it has no
branching (threading), no HTML ... no nothing, basically.
URLs in your posts get magically turned in to links, and blank
lines become paragraph breaks, but that is about all.
There is
one discussion page per panel.
The Picky Picky Game
prototype stopped working this morning because of a bug in
the code that sets up a new panel (and this morning is when the
new round begins). I was able to patch the running version from
work during lunch, and have now done a proper fix on my
development system at home.
Read more
The image-cycling feature of the
Picky Picky Game
prototype depends on using JavaScript to load images. If
you click on the Cycle button before the images have been
prloaded, then nothing visible happens—it appears to have
failed. There is no way for the user to see whether the images
have loaded or not. I have attempted to add such an indication,
only to be thwarted by what appear to be bugs in the web
browsers I have tried it on.
Read more
I have fiddled with the layout of the archive pages of the Picky Picky Game so that
(a) it lays out all the panels in one row, rather than
wrapping to multiple tiers per page, andf (b) it uses an
HTML table to achieve the layout. This should remove the
occasional glitch where cycling panels caused the layout to
change.
Read more
Yesterday the server hosting our
Picky Picky Game
suffered a hardware failure, and for some reason we have lost
all additions to the object
database since it went live. After
discussion
on LiveJournal, enough people found pictures cached in their
web browsers that I have been able to reconstruct
an
archive of the completed panels (with only a couple of
lacunae), and so we plan to start the game afresh, with the
output of the previous game inserted as Page 0 of its archive.
Read more
We have started a new Picky
Picky Game to replace the one
whose database vanished. I have added code to the server so
that the archive
of the old game is inserted before Page 1 of archive of
the new game. The first round of candidate panels has been
primed with the candidates from panel 25 of the old game.
Read more
I have created an
experimental RSS
2.0 feed for the Picky
Picky Game. It shows comments added to the pictures, plus
an entry whenever the game ‘clocks’ (that is, when a
new panel is completed). Seems to work. Haven’t tried it
in an aggregator yet, but it does validate.
Read more
I have tweaked the RSS feed for the Picky Picky Game so
that comments have the name of the person who wrote the comment
as their title, and also so that the ‘Round N
completed’ refers to the round that has been completed,
not to the one following it…
Read more
Once again the Picky Picky
Game had problems calculating dates. Alas! that the
Python-2.3 datetime module arived too late to carry this burden
on my behalf. This time it was not month #0 that caused
problems, but, predictably perhaps, month #13. Feh.
Read more
The Picky Picky game had a bug which mainly affected
Teacake (badasstronaut): pictures submitted for the current
panel would instead turn up in last week's panel (the one being voted
on), or even once in the panel from the week before!
Read more
I wrote that I planned to return to my TurboGears version of the
Picky Picky Game once I had picture-uploading working. Yesterday I
finally had a spare afternoon to do some more hacking on Picky2, and got
uploading of pictures working. But before that I will finish off my description
of doing authentication with OpenID.
Read more