One of the differences between Django and many older web application platforms such as PHP, ASP, ASP.NET, JSP, etc., is that it has no implicit URL routes: every URL that is handled by your Django site is handled by matching it in a URLconf. And this is a good thing.
Background
URLs work like a government department: there is no central list of
files, so instead if someone wants the list of sports centres is, they
will ask you to ‘go to Millie and ask for the list of sports centres’.
The URL http://jeremyday.org.uk/projects/ tells you (1) use the HTTP
protocol, (2) talk to the computer named jeremyday.org.uk
, and (3) ask
for /projects/
. These parts of the URL are called the scheme, the
authority, and the path.
Within the server there is a routine that decides, given the path, which procedure to use to generate the response. The process of mapping a URL path to the routine is called URL routing.
URL Routing Using the File System
The file system on your computer is designed to map names on to data, so why not use that to organize your web server?
Early web servers had a single routing algorithm: data for the
response is stored in files such that the file path was the same as the
URL path. For example, if you set Apache’s DocumentRoot
directive
to /var/www/jeremyday/
, then the file /var/www/jeremyday/foo.html
is
used when the server gets a request for /foo.html
. So very simple and
lovely.
It gets more complicated when you remember that not all possible URL
paths are valid file names. In particular, directly mapping /projects/
is impossible because file names cannot end in a slash. (Actually on
some systems directories are also technically files, but here we mean
the sort of file that can contain HTML data.) So the DirectoryIndex
directive tells Apache to treat /projects/
as if the path had been
specified as /projects/index.html
.
Sometimes you want a program to generate the response data; Apache added
a system where certain file folders were expected to contain programs
(CGI scripts) instead of HTML files. So more directives were
added to control this. Later support for programs written in languages
like PHP was bolted on with the AddHandler
directive. Because it works
by recognizing the .php
suffix, PHP requires URLs to end in .php
.
URL Rewriting
Systems like Drupal want instead to have URLs like
http://drupal.org/node/69500
. to do this you have to add another layer
of complexity to your configuration, called URL rewriting. The
following Apache directives should do the trick:
# Rewrite current-style URLs of the form 'index.php?q=x'.
RewriteCond %{REQUEST_URI} "!cgi-bin/"
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
The upshot of this is that /node/69500
is rewritten as
index.php?q=node/69500
, at which point index.php
is recognized as
invoking PHP. Drupal’s code then takes over, and begins the URL routing
process from scratch starting with node/69500
.
The upshot of this is that to let Drupal do its own URL routing, you need to add a dozen directives expressly to disable Apache’s default URL routing. Apache directives are notoriously difficult to get right at the best of time, and any configuration file involving rewriting rules is automatically confusing and difficult to debug (see sendmail for another example), so getting URL rewriting right is sometimes viewed as a back art.
Comingling
Another problem with file-based routing is that you have all your code files and other files mixed up together. This means
-
You have to add more directives to tell Apache to not serve your source code to naughty children eager to find out your database passwords.
-
If naughty people can upload a file called something.php on to your server, they can run their own code on your server—a security disaster. If they can clobber one of your own
.php
files it is worse. -
It can make maintenance more complicated if you have different people updating the code files versus the stylesheets and graphics.
-
On a high-traffic site you will want to move static files on to a separate web server (or a content-delivery network), and this is harder if the designers of the web pages assumed the files are all mixed up together.
That doesn’t mean there’s no place for a file-system-based web server, just that it should not be the default or sole URL routing algorithm.
Routing without Files
Deployment with Nginx (or similar)
Nginx is one of several small-footprint web servers that whose job is to speak HTTP to the outside world and then pull the result from its cache or dispatch to the real application server as quickly as possible. Nginx has modules for static files, HTTP, and FastCGI as resolution mechanisms; you can in principle recompile it with just the modules you need.
On my own practice server I have Nginx proxying to my Django application with FastCGI. I think this is a pretty good arrangement (doubtless someone can tell me I have missed a trick and there is a better way; there always is).
With Nginx, you can essentially hand over the entire URL to the application framework as-is with minimum mucking about. What happens next is up to your framework.
Zope and Rails
I played with Zope some years ago, so what I am describing is probably out of date.
Zope had a routing algorithm based on Python objects: the path is mapped on to a hierarchy of objects. Each element in the but the last two are dictionaries in which the next segment is found. The last segment is the method name.
Rails takes a similar approach. The default routes handles a path
/forum/edit/123
by creating an instance of a class named ForumController
,
which must be defined in a file app/controllers/forum_controller.rb
, and
invoking a method named edit
. A dictionary called params
will contain
an entry mapping :id
to 123
.
In both cases the routing is no longer tied to files on the web server, which is nice, but it is tied to the names of methods and classes in the code.
Drupal
[Drupal][] 6 calls its URL routing system menus. Each module can offer a set of menu items, and these are knitted together in to the hierarchical list with wildcards that specify entities to load. Being implemented in PHP, the code is rather clunky:
$items['example/baz/%example_baz/edit'] = array( 'title' => 'Baz', 'page callback' => 'example_baz_edit_page', 'page arguments' => array(2), 'access arguments' => array('access baz content'), 'type' => MENU_CALLBACK, );
This says that paths like /example/baz/123/edit
will cause the
function load_example_baz('123')
to be called, and if it returns
non-false the page function example_baz_edit_page
will be called with
that return value as its argument. (The type
entry specifies that this
defines a URL route that has no corresponding item in the menu as shown to the end user.)
Drupal modules have to choose disjoint prefixes to avoid name clashes with other modules.
When Interaction Designers Attack
These routing methods assume some sort of sensible mapping between URL paths and the underlying object types in your database.
This all comes unstuck when you are working on a government web site and
your designer decides that the QCDA forum should be
/qcdadirect/feedback
whereas the nine regional forums should be called
things like /communities/regions/yorkshireandhumberside
, because that
better reflects the user’s journey through the site. Or perhaps you are
implementing a news website and want entries to have paths like
/2010/08/05/man-bites-dog
rather than /node/12345
.
With Ruby it is possible to address this through explicit routing:
map.with_options(:controller => article, :blah => 'baz') do |article_map|
article_map.news_front_page '/', :action => :front_page
article_map.news_article '/:year/:month/:day/:slug',
:action => :show,
:year => /20[0-9][0-9]/,
:month => /[01][0-9]/,
:day => /[0-3][0-9]/,
:slug => /[a-z-]+/
end
Of the parameters, those with regexes as values constrain elements in
the path, those without are default values that do not have to be in the
URL, and it took me a while to realize the method name is being used to
pass another parameter, the name of this route. I spent ages trying to find where the
method news_article_path
was defined.
With Drupal as an administrator you have the option of installing the
paths
module and adding URL aliases for the pages whose paths you care
about. This works pretty well—code can use the internal paths, and they
are converted to aliases in the HTML—but you would have to define an
alias for every article. (And they did!) You could also create a module
whose sole contents was PHP code for new menu entries that use the
desired paths and map them on to the existing modules’ functions.
You can even do URL routing in ASP.NET 3.5 or so; I know this because I
created a site using them to let me have URLs like /video/1
. The code
is naturally far more verbose than the Django or Rails equivalent, and
you need special incantations in the configuration file to suppress the
default routing algorithm. It also did not work on our web server—still
running ASP.NET 2.0—so I had to rewrite it to use rather the style
Video.aspx?videoID=1
after all. Oh, well.
How Django Differs
Explicit Routes
Django has no default routing at all: cleaving to the Python principle that explicit is better than implicit, the URLs understood by your site are listed in your URLconf, a Python file mapping regexes to the Python callable that implements them:
more_args = {'blah': 'baz'}
urlpatterns = patterns('example.newspaper.views',
(r'^$', 'front_page', more_args, 'news_front_page'),
(r'^(?P<year>20[0-9][0-9])/(?P<month>[012][0-9])/(?P<day>[0-3][0-9])/(?P<slug>[a-z-]+)$',
'article', more_args, 'new_article'),
)
For example, given path /2010/08/05/man-bites-dog
, it matches the fourth
entry and calls the function example.newspaper.views.article
as if invoked as follows:
article(request, year='2010', month='08', day='05',
slug='man-bites-dog', blah='baz')
Admittedly anything involving regexes is not going to be pretty, but I
think the Django version is about as concise and flexible as you can
get. The view function (which is Django’s term for what Rails calls a
controller’s action method) can be just a function or it can be an
object method or any object with a __call__
method. Pattern-matching
with regexes is well-understood and is fast.
Separation of Static Files (No comingling)
Django makes mixing static files in with your dynamic pages as
inconvenient as possible. There is a way to serve static files from
your development server but it is discouraged. Since Apache is
installed by default on Mac OS X, I do my development with the static
files served directly by Apache via a symbolic link, just to keep things
nicely separated. On the production site, I use a separate domain name
static.jeremyday.org.uk
for static files used by jeremyday.org.uk
. they
are both the same server, but on a larger site with many visitors I
could easily switch to using a separate static server.
This also means the security concerns of having code files on your web server’s file system can be greatly reduced.