10 entries tagged
cgi
Opera 5 omits the boundary parameter when uploading files.
Lynx 2.8.2 does not support uploading files at all (but,
oddly, does generate multipart/form-data
forms
properly—it even gives the charset
parameter
to the content-type of its form items). Python’s
multifile
module raises an exception on all of the
above, for some inputs.
I guess that if I want to handle uploaded images, I get to
write my own multipart/form-data
parsers from
scratch. I have already done this in C++ for work;
I guess I can do it again in Python. Sigh.
I have now written a parser for HTML-4.0 file uploads (forms
with enctype multipart/form-data
). It will need
some finessing to get character encodings to work right, but for
the simple cases I tried it uploaded files flawlessly, and
moreover, plugged in to the back-end script I mentioned in
an earlier installment.
Alas! When I tried uploading from Jeremy’s NT box, my
Python program crashed with an IOError
exception
with
errno=EAGAIN
. I guess I need to do some
sort of loop to fill my buffer. Ho hum.
It occurs to me I may be being harsh on Opera;
I notice that elsewhere they show a preference for
splitting MIME parameters over multiple physical lines. For
example, they use
Content-disposition: form-data;
name="fred"
as opposed to
Content-disposition: form-data; name="fred"
It is just about possible that this confuses thttpd
so that it clips everything after the first CRLF when passing
the headers to my script via CGI...?
On Monday I was troubled by
EAGAIN
interruptions when reading in a CGI
script’s data. It turns out Python has a cgi
module already. But when I tried creating a script that
used that, it failed to work with Opera’s boundary-less
multipart (the built-in cgi
module uses the
multifile
, which I tried and rejected
earlier).
I have tried looping until EAGAIN
does not
happen—but I put a limit of 10 iterations so as not
to chew up the CPU. No dice. I have also tried using the
fcntl
module to remove the O_NONBLOCK
flag from stdin. The result is that instead of crashing with
EAGAIN
it waits indefinitely (and gets interrupted
by thttpd
’s watchdog timer).
The upshot of this is that I have the beginnings of a CGI
script that works if I connect to it from the same machine the
server is running on, but not if I connect to it from a
different machine (an NT box) on the same network.
The thing is, I know that people have successfully
written CGI programs in Python, and none of the examples
I find on-line have any mention of these phenomena.
I throught I’d try out a different CGI framework, such
as jonpy, and this
requires Python 2.2. So I have now installed
Python 2.2 (carefully installing GNU db, expat, etc. first
so these modules will be built). During its self-tests,
I noticed that test_socket.py
takes 2½
minutes (to do something that should take approximately no
time). Come to think of it, initiating connections to my Linux
box from Jeremy’s NT box also takes an inordinate amount of time. That
might be why initiating HTTP connections to my
thttpd
instance also takes an inordinate amount of
time, so long that thttpd
kills the CGI rather than
waste any more time. In other words, my CGI problems may mostly
stem from a broken network stack. Teriffic. This is a
variation on Joel
Spolsky’s law of leaky abstractions: I would like to
be able to believe in POSIX’s abstraction of sockets as
being a lot like a file, but sadly it is all frinked up.
Another reason to spend a week or two installing Debian some time.
I think the way forward for now is probably to ignore the
network problems and cross my fingers when I install it on
the actual server. Given that fairly thorough search of the WWW
and Netnews reveals no discussion of the sort of problems
I’ve been having, I am fairly sure it is some
freakish glitch in my computer...
Even in a toy web application like the Picky Picky Game, it is
possible (but unlikely) that two people will want to upload a
picture at (nearly) the exact same moment. If two processes try
to write the same file at the same time, the results could be a
mess. It follows that we need to include something to
co-ordinate the changes.
Using ZEO to coordinate CGI scripts
So far getting uploads to work with the
Picky Picky Game is an uphill
struggle. Some web browsers work and some do not, and it is
going to require some fancy diagnostic tools to find out why
not.
—More (10%)—
I am rethinking my original plan for the Picky Picky Game, which was to store
resoures in files as often as possible. For example,
index.html
is a static file (not dynamically
generated every time someone visits it). This requires that
when something happens that means index.html
should
change, this file has to be updated.
Pros and cons
As mentioned earlier, I am working on a version of the Picky Picky Game that does not need to
write to files (since some web servers will be set up that way
for security reasons). In some ways this simplifies things,
because it means there is only one URL hierarchy to worry about.
URL design
I demonstrated the
Picky Picky Game prototype
to the
CAPTION committee. The
main trouble was picture resources not being
downloaded, or, oddly, vanishing when one refreshed the
page. It worked better on
the dial-up than the broadband connection (though Jo
blames that on
IE 6
being set up to cache nothing).
I resolved to make an effort to sort out
caching—or at least, the things my server needs to
do to enable caching to work smoothly.
So, in my development system at home,
I added If-Modified-Since
and
If-None-Match
(etag) support to the routine
that fetches picture data out of the database. I also
added an Expires
header set, as
RFC 2616
demands, approximately one year in to the future.
Result: none of the pictures appear.
The problem is
that the web server I am using at home always
returns status code 200 for CGI scripts (it ignores the
Status
pseudo-header). As a result, my clever
304 (‘Not modified’) responses result in
apparently zero-length data. Argh!
When I worked this out, I though I would
demonstrate to Jeremy that it worked in the stand-alone server (which
does not use CGI). But Lo! all the pictures failed to
appear once more. So did the page itself. What gives?
This time the trouble was its logging
function—it tried to resolve the client IP
address. Now, I thought that the address used by my
PowerBook did have reverse look-up in my local
DNS, but
in any case, the server should not be indulging in DNS
look-ups given that on my system that is a blocking
operation that tends to mean the program locks up for 75
seconds. Luckily BaseHTTPServer
makes it
easy to override the function that indulges in DNS
queries and it all now runs smoothly.
On the positive side, I have made one cache-enhancing
change that has worked, albeit only for the old panels
(which saved their images as separate disc files rather
than in ZODB).
Simply put, there is another base URL
used (in addition to the base of the web application and
the base URL for static files), and this one is for
picture files. This means that these old pictures are
now, once again, served as static files, with etags
and caching the responsibility of my host
HTTP server,
not me.