First Cut of PyCouchDb

I have been experimenting with CouchDb by creating the beginnings of a Python API for it. It is not really ready for prime time yet, but in best free-software tradition I am publishing early and will add to it when I have time.

Update. In the time between posting to my blog and posting to the CouchDb group on Google, someone else started an identical project couchdb-python. I’ll have a look whether any of my code can be usefully merged with this project, but by the look of things it’s a little further on than I was already.

I have not created a proper distribution yet, but you can download PyCouchDb-0.0.tgz and install it by hand if you want to try it out.

CouchDb’s use of HTTP and JSON as the protocol, together with httplib2 and simplejson makes this code almost trivial. Most of the work was in translating the unit tests from JavaScript to Python! :-) A Python API is nevertheless useful because it can be extended to do sanity-checking on parameters before making HTTP requests.

What follows is the first approximation to some documentation.

Module Contents

Module name: couchdb (I dithered for some time as to whether I should arrogantly claim the top-level name couchdb or create one in a private namespace like alleged.couchdb. In the end I have plumped for the former.)

Classes provided: Server, Db

Class Server

Represents a database server. You use it to create and delete databases, or to get a list of databases. You create the server object using the base URL of the server. For example:

import couchdb
 
server = couchdb.Server('http://localhost:8888/')

Creating and Deleting Databases

You create and delete databases using the database name. Database names are lowercase alphanumeric strings that may not start with an underscore.

my_db = server.create_db('applecart')

Returns a Db object. If the database already existed, then it returns a Db object attached to the existing database.

You also delete databases by name:

server.delete_db('applecart')

This returns a boolean which is true iff the database existed before you deleted it.

Listing Databases

You can get a list of database names as follows:

db_names = server.all_db_names()

Returns a list of strings.

Class Db

You create a Db object from its base URI, which is formed from the server URI, the database name, and another slash:

import couchdb
 
db = couchdb.Db('http://localhost:8888/applecart/')

You can also get a database object using the Server.create_db method.

Documents

A database is a repository of documents. A document is represented in Python as a JSON-compatible dictionary. JSON-compatible means:

  • All keys are strings; and
  • All values are JSON-compatible.

JSON-compatible values are:

  • scalars: None, Boolean, numbers, strings (Unicode or byte string);
  • sequences of JSON-compatible types; and
  • mappings whose keys are strings and values are JSON-compatible.

Document keys starting with an underscore are reserved for CouchDb’s use. There are two special keys:

  • _id is the identifier of the document (a Unicode string); and
  • _rev is the revision of the document (an integer).

You may supply a value for _id when creating the document. If you do not, the server will choose an opaque 32-digit hexadecimal number.

Basic operations: put, get, delete

The Db object has methods put and get for storing documents in the database and retrieving them.

doc = {'_id': 'fredj', 'name': 'Fred Jones', 'coins': 189}
db.put(doc)

This returns an object with _id and _rev attributes that describe the new document. Also, as a side effect, the document is updated with entries for _id (if not already specified) and _rev.

You retrieve documents by document identifier:

doc = db.get('fredj')
print doc['coins']

To update a document you retrieve it, modify it, and put it again.

doc['coins'] += 5
db.put(doc)

It is important that the document you put back is the original document, modified, or at least that it has the correct _id and _rev members. These are used to check that no-one has modified the document between your obtaining it and storing it back.

In the unlikely event that there was such a change, a ConflictException is raised. At this point your program may choose to retrieve the value, make the change again, and resubmit it, or take some other action.

Finally, you can delete a document.

db.delete(doc)

Note that doc must be a dictionary, and contain at least the _id and _rev members. Deletions can lead to conflict exceptions as well.

Queries and Views

A view is a JavaScript function that is executed on every document in the database. The value(s) returned by the function are accumulated in to a set of JavaScript objects called rows, returned as a sequence.

A query is a one-time (temporary) view.

result = db.query("""
    function (doc) {
        if (doc.coins > 10) {
            return doc;
        }
    }
    """)
print result.total_rows
for doc in result.rows:
    print doc['name'], doc['coins']

Exactly what you get back int he rows member depends on the JavaScript.

Information About the Database

The total number of documents is in the database’s info:

print db.info().doc_count

To get a list of all document ids:

for info in db.all_doc_infos():
    print info._id, info._rev

Pretending to be a Dictionary

You can also treat the database as a dictionary, with document ids as its keys and documents as its values. Sometimes these methods are less efficient (use more round-trips to the server) than using put and get explicitly.

In particular, you can loop over all documents one at a time like this:

for doc in db.itervalues():
    ...doc...

Because this is an iterator it only loads the documents on demand.