Database Disruption From The Couch

Web development typically follows a specific set of steps. There are different workflows for different people, and I’m not trying to establish a best-practices workflow here (just the opposite, as you’ll see), but the differences are usually in the order in which the steps are performed, not the actual steps themselves.

The first step (again, conceptually the first, not necessarily chronologically) is to design a database for the application’s needs – the database tier. That means breaking down the various components of your blog into tables. For example, if you are building a blog engine, you would need at least a table for articles and another for comments.

The second step is to write a “middleware” tier – that just means software that sits in between the client and the the database. You don’t want people to connect to the database directly or they’d be able to wreak havoc. The middleware tends to be where you do things like keeping track of who’s allowed on what page.

The last step would be to design a “view” or “presentation” tier to let your clients interact with your database in a (hopefully) safe manner. The view tier, among other things, takes the data that is broken up into tables and puts it back together again.

Ruby on Rails, which I’ve loved for years, excels at connecting these dots. ActiveRecord, which I consider the crown jewel of Rails, does a fantastic job at simplifying the complex job of creating and managing the database, for example. Rails also handles the middleware tier quite well with ActionController. The view tier (ActionView) doesn’t add all that much, but server-side view tiers rarely do when it comes to the web.

This is all going to be obsolete with CouchDB*.

What makes CouchDB so damn disruptive is not the fact that it beats frameworks like Rails – it’s that it completely side-steps them.  CouchDB dispenses with structured tables altogether, relying on “documents” instead. Documents are arbitrarily complex data structures that contain as much or as little data as you need. The (partially completed) free book on the official site uses the example of the business card.  Some business cards have fax numbers, and some don’t.  Some have website addresses, and some don’t.  In a classic, structured database, you would need to have a column in the database for fax numbers, even if only one business card in your database has one, because the structure of the database has to be in place before the data is added.  If the structure doesn’t accommodate the data, then the structure has to be altered, which is always a potentially dangerous thing (Rails handles this with migrations).  CouchDB lets you put whatever data you want in whatever document (in this case, business card) you want.  That means that instead of having to add a fax number column to every business card in the database, you could simply add a fax number to the new document and only the new document.

ActiveRecord is amazing at simplifying all the structured SQL stuff.  But once that complexity starts to go away, the value added by ActiveRecord starts dropping off dramatically.  If the database tier is already super-simple, why bother with a fairly hefty helper library on top of that?  CouchDB itself makes ActiveRecord (and other fancy ORM’s, like Hibernate) largely obsolete (and even quaint).

Another cool feature of CouchDB is that it uses HTTP for all interactions.  That means that the database runs on a web server.  Since CouchDB has built-in HTTP features like authentication, cookies, and caching, there’s not much the middleware tier is needed for, especially given that CouchDB works directly with AJAX, so much more logic can be put in the browser.

The last thing is the view tier – getting the data back to the user**.  CouchDB uses something called MapReduce, invented at Google, to deal with searching through existing data.  It’s a little tricky, and possibly the only thing preventing me from jumping fully on board with CouchDB.  Instead of SQL queries, you write “view functions” for CouchDB.  The end result is pretty much the same.

As an aside, one of the coolest thing about view functions is that they can be stored in CouchDB directly.  You just give them a unique identifier (like “view-blog-articles-by-date”) and store them as documents in CouchDB.  That means application code can be stored in the database, which is great because backing up the database will backup a good chunk of your code as well.

So in conclusion, a lot of what makes Rails great, and it is great, is deprecated by this humble open source application.  I didn’t even get to the built-in scalability that CouchDB has going for it.  CouchDB is a Big Deal, and I’m keeping my eye on it.

–Daniel Tsadok

* To qualify slightly, the concepts introduced by CouchDB are going to be disruptive, whatever form they take (i.e. even if CouchDB itself doesn’t take off).

** I’m fudging the presentation tier and the view tier a bit, in case you’re nitpicky ;-)  This post is not MVC-compliant.


My jQuery Tutorial

As part of my school‘s “DriveBy” program, where students teach other students about various topics, I wrote a tutorial for the awesome jQuery library for Javascript. Check it out!