Web servers started as a solution to getting information from other sites. Then it became convenient to use HTML and HTTP on one's local-area network, and for some reason we had to call that idea an 'intranet' to make people pay attention. Sometimes it is useful to run a mini-server on the same computer as your desktop application; in this note I'll discuss this idea in the context of an application written to Microsoft's .Net platform, since that's what we use at work.
Desktop web servers
The idea of running a web server locally is not new. Dave Winer used this technique in Radio Userland something like ten years ago: you run a web server on the user's desktop, and the user-interface of the program runs as HTML in their favourite web browser. When the application is about building weblogs, it makes some sense to hand over the job of displaying HTML to the web browser, rather than trying to develop your own.
Depending on your operating system, presenting your UI in HTML does not even
mean it has to look like it is running in a web browser if you don't want it to.
On Microsoft Windows, you can turn your HTML page in to a HTML Application
(HTA) by changing the file name to end in .hta
; if you have installed
Microsoft .Net Framework or Microsoft SQL Server 2005 recently, you will have
seen an HTA in action: the installer front-end is implemented as HTML in this
way. On the Mac, the closest equivalent is Dashboard widgets. Another
possibility is to have your application use an embedded web-browser as part of
its UI.
Obviously the performance requirements for a desktop web server are different from one used to field requests from all over the internet. This means that the footprint of the server can be much smaller: it does not need a farm of worker processes to be set up, or eleborate support for caching or esoteric HTTP headers or multi-layered security. Well-known, 'industrial-strength' web servers like Apache and Microsoft IIS are too heavyweight for this niche: you want a server that can scale down not scale up.
Why use a Desktop Web Server in .Net?
Without going in to details (for reasons of client confidentiality), I think I can admit to working on a desktop application using Microsoft .Net and its WinForms class library. We want to allow users to add comments to things in the program which for this article I shall call boojums; the customer wants threaded bulletin-board-style comments. While it would be possible to display indented, hierarchical comment lists with WinForms widgets, it is easier to see how to do this in HTML (WinForms is already obsolete, so learning how to do it in Winforms would be a waste of time).
We already display some other info via an embedded browser window. For our existing use of the browser control, we simply supply a static HTML file for it to display (this involves making lots of temporary files). For the comments, I will need users to be able to click on comments to show and hide them, and also a way to reply to a particular comment. For this I want a mechanism for the JavaScript running in the HTML to talk to the host application; rather than work out how to install a new protocol (which is how Microsoft solved this for the Books Online used in Microsoft Visual Bedsit Apartment Workshop .NET Solution Suite 2003), it seemed more straightforward to run a desktop web server for the HTML in the comments view to talk to.
The trick is that the .Net worldview is all about web servers being big, in fact, so far as .Net is concerned there is only one web server, IIS, and only one way to write web pages, ASP.NET, both of which are far to clunky to stick on the side of a desktop app. So I have to roll my own. This is the opposite of the situation with Python, which is so eager to make making web servers easy, people complain there are too many web frameworks!
Adapting WSGI for .Net
You could spend ages designing a web server, and working out how to divide it in
to self-contained chunks that can be tested independently (I definitely do not
want to debugging of the web server to be mixed in with running the GUI
application). To save time, I decided to start by adapting the simplest
web-server API I know of: Python's WSGI (Web-server gateway interface). This
is interesting because WSGI exploits Python's flexibility to define an interface
using no class definitions at all: your web application is not a class or a
swarm of classes, but a single callable object (a function, or a method on a
class instance, or an object that pretends to be a function by implementing a
method named __call__
), and the web server exposes its functionality as a
dictionary and another callable. In .Net I use delegates where Python uses
callables (I could have used interfaces with a single member to similar effect).
Thus we have a division of labour that goes like this. The application-specific 'top half' of the server is a function that matches this signature:
public delegate IEnumerable HandleRequestDelegate(
IDictionary environ,
StartResponseDelegate startResponse);
where environ
has entries like PATH_INFO
and REQUEST_METHOD
based on the
CGI conventions, and startResponse
is a function supplied by the
application-ignorant, HTTP-savvy 'lower half', and has this signature:
public delegate void StartResponseDelegate(
string status, IList headers);
The handler function examines environ
to get the information about the
request. It then calls startResponse
to begin its response, supplying the
headers for the HTTP response as an IList
of DictionaryEntry
objects (this
is the closest I could find to an equivalent of WSGI's list of tuples; I wonder
if an IDictionary
might have done just as well). The result of the function is
an IEnumerable
that returns zero or more chunks of the body of the response.
These chunks can be strings (which are always encoded as UTF-8 in my server) or
byte arrays.
A simple request-handler function would look like this:
private IEnumerable HandleRequest(IDictionary environ, StartResponseDelegate startResponse) {
ArrayList headers = new ArrayList();
headers.Add(new DictionaryEntry("Content-Type", "text/plain; charset=UTF-8"));
startResponse("200 OK", headers);
return new string[]{"Hello, World!"};
}
This is the direct adaptation of the corrsponding Python WSGI code:
def handle_request(environ, start_response):
headers = [('Content-type','text/plain; charset=UTF-8')]
start_response('200 OK', headers)
return ['Hello, World!']
You now create the server and start it up like this:
InProcWebServer server = new InProcWebServer(
new HandleRequestDelegate(HandleRequest));
server.Start();
Uri baseUri = server.BaseUri;
The BaseUri
property is something like http://127.0.0.1:2134/
, and is used
to generate URLs that talk to the new server. Because the hostname is
127.0.0.1
, only processes running on this computer can connect to the server.
The port number 2134 is generated by the operating system, and is different each
time you run the program.
This leaves me with two tasks:
-
Write the bottom half: the web server, which listens for requests, parses them to create the
environ
disctionary, and then calls the handle-request function; and -
Write the top half: a method on the application object that implements the handle-request function.
The web server part is (at least in principle) reusable in other projects.
Writing the lower half
The first few unit tests use WebRequest
and WebResponse
classes to talk to
the HTTP server. Since this trivial web application does not look at the
environ
parameter, I was able to write the first unit tests that tested the
server threading without having to write the entire HTTP parser in one go.
This was the first time I'd needed to investigate the .Net class library's
networking classes. They supply a Socket
class that does all the things that a
Unix programmer from the 1990s would expect. They also have a wrapper
TclListener
that does bind, listen, accept for you. The greatest difficulty
was in guessing the class name of the synchronization object I would need to
co-ordinate server (background) thead with the caller: AutoResetEvent
, as it
turned out.
The reason I need to synchronize the two threads is that the socket for the
server is created without specifying a port number: the system supplies a random
unused port. So my calling thread---which needs to know the port number---has to
be suspended long enough for the TcpListener
to be created and ready to
divulge this piece of information.
After that, the server thread starts a loop along the lines of
for (;;) {
TcpClient client = listener.AcceptTcpClient();
if (state == State.Stopping) {
break;
}
Thread thread = new Thread(new ThreadStart(
new Acceptor(client).Run));
thread.IsBackground = true;
thread.Start();
}
I then define an inner class Acceptor
that basically does nothing except
provide one method---Run
---and a place to store its one variable, the client.
The method Acceptor.Run
handles one HTTP request, issues the response, and
then quietly terminates.
The next step is easier if you have two monitors. You display a copy of RFC 2616 (or the W3C's sectionized version) on one screen and your development editor in the other. I mentioned earlier that you can write the first few unit tests so they only need you to parse the Request-Line (section 5.1), leaving the message headers for later.
When parsing the incoming stream, I wrote the code to work byte by byte,
taking care not to copy character data in to in a StringBuilder
until after byte stream had been
passed through my scanner. The point to this is that the syntax of HTTP requests
is defined in terms of bytes (or octets), for which ASCII is a convenient
representation, and not character strings (which in .Net are something
different). I also did not want to risk .Net being to clever about line-endings
and confusing matters. This is particularly pertinent when parsing the
message-headers, where you might think you can just use the built-in
ReadLine
routine, but then get confused over splicing in continuation lines and
not eating the whitespace at the start of the message body. To get it right, read the RFC and
draw yourself a finite state machine on a piece of paper, that's my advice.
After that you need to call the HandleRequestDelegate
, and then pump out the
headers and body content to the socket stream, and the server's job is done.
Developing the top half
The other side of the fence is the top half, the bit which knows about the
application. The neat thing about WSGI is that, apart from the unit testing
required to show the server stuff works, you do not need to test all of the
application-savvy half via the HTTP server: instead the unit tests can just
create an environ
array, feed it to HandleRequest
, and check the header list
and body that result. This makes the unit tests a little less complicated (and
probably makes them run faster, too).
After a while patterns began to emerge in the top-half code. For example, a lot
of the URL namespace managed by the server was split in to segments like
/static
(read from files on disc), /boojums
(information about boojum called
fred
would be under URLs like /boojums/fred/comments
). In the first iteration
of the code, this was done using repetative code:
string path = (string) environ["PATH_INFO"];
if (path.StartsWith('/static/')) {
string fileName = path.Substring(8);
... look for a file and return its conents ...
} else if (path.StartsWith('/boojums/')) {
string rest = path.Substring(8);
... look for a boojum and return information about it ...
} ...
The began to get overly elaborate when I wanted the GUI to be able to add an
extra URL path /state
, that would tell the caller which was the current
boojum, and knew I would need to add HTTP methods to update boojum properties as
well as extract them. So I spent Friday factoring out 'middleware' components.
Middleware is the WSGI term for an object that handles requests by invoking
other request handlers---and doing a little bit of work in between. One example
is class Selector
, a simplified version of the Python WSGI Selector. The
Selector
class as an Add
method you use to register handlers for parts of
the URL namespace:
Selector selector = new Selector();
selector.Add("/static", "GET", new HandleRequestDelegate(
new FileServer(dirName).HandleRequest));
selector.Add("/boojums", "GET", new HandleRequestDelegate(Boojums_GET));
selector.Add("/boojums", "POST", new HandleRequestDelegate(Boojums_POST));
InProcWebServer server = new InProcWebServer(
new HandleRequestDelegate(selector.HandleRequest));
My selector is not yet as clever as the WSGI one: it does not extract URL
parameters. All it does is keep track of the PATH_INFO
and SCRIPT_NAME
entries in environ
, shifting the matched portion across so that the inner
request-handler thinks its namespace starts at /
.
There are two advantages to extracting the Selector
class from the application
code. First, it replaced a lot of repeatative ad-hoc switch and if statements
with declarative code, which means, it reduces the amount of
application-specific code by moving this functionality in to the generic class
library (which might be reused on other projects). Second, my boojum
class-library could supply the implementation of /boojums
, and the GUI layer
add another handler for /state
without the class library needing to know about
it.
Similarly, we need a way to synchronize access to the database of boojums, since
the database is not thead-safe. After a few false starts I settled on another
middleware class, which has a member variable handleRequest
set via its
constructor, and a public member function HandleRequest
method does whateber
is necessary for synchronization and then calls the handleRequest
delegate.
When unit-testing the code, we cannot use the same synchronization code as we do
in the GUI (because we use the magial method Contol.Invoke
); spinning it out
in to a separate middleware class allows us to test the rest of the code with a
different synchronization technique.
Does it work?
So far I have spent a week developing the web server and wiring it in to the application, and it has now reached the point where it works: the HTML page shown in the browser widget gets the current boojum's address, and then gets the list of comments and displays them. It also displays a form that can be used to add a comment to the boojum. There are still some details to sort out (such as making the form appear when you click Add Comment, rather than having it clutter up the place all the time), but it is looking pretty good for a week's effort.