Feature prioritization for Pillow the CouchDB shard manager

I have now reached the end of my todo list for Pillow. That doesn’t mean it’s finished and ready to be stamped version 1.0. In it’s current incarnation it is fully usable and production ready, but in order to earn a 1.0 it needs to do a bit more.

The current resharding always doubles the number of servers required. Since you may overshard, that doesn’t necessarily mean you have to double the number of physical servers, but you need to organize more CouchDB instances than you might otherwise need. Smoother sharding algorithms that enable addition of single additional servers exist (consistent hashing) so Pillow should support this.

Pillow currently only supports rereducers written in Erlang. It would really be nice to support JavaScript for rereducers. A summing rereducer exists and mappers without reducers works just like in CouchDB. However when you have more complex reduction needs, copying the reducer code from your CouchDB into Pillow beats writing (and maintaining) them again in a new language.

Pillow should really support the bulk document API of CouchDB. I haven’t used this one myself, but adding support should be pretty straightforward.

CouchApp support is harder since it requires JavaScript support and then some. I probably need to play around with a CouchApp or two to find out more, but since I haven’t done so, it’s hard to determine how much work it would take.

While I do hope that there are no non-replicated CouchDB servers in production out there, reality is that there probably are lots. I like the three-way replication minimum myself and with CouchDB’s master-master scheme, it works really well. Pillow however is currently happily ignorant of any replication you have set up. I would really like to have Pillow manage such replication. In addition to managing replication, sets of Pillow servers should be controllable from a random server in the same master-master way ensuring full control of your cluster from any single Pillow node.

There is no clear prioritized list right now, all features listed above (and probably more) would be beneficial. However, as I am currently the only one developing Pillow and the time I can spend on Pillow is limited, I have to prioritize. The five features can be grouped:

  • CouchDB API compatibility: JavaScript views, bulk documents, CouchApp
  • Production flexibility and scaling: Consistent Hashing and Replication management

It is not hard to admit that API compatibility is important, but the core of the API is supported. Production flexibility and scaling is more important for me at the moment and I will probably focus on that. I also think that replication management is slightly more useful than consistent hashing. Choosing between the API features is harder since I don’t need them myself, but JavaScript views is a prerequisite of CouchApp and bulk document support is straightforward in comparison to CouchApp leading to this priority list:

  1. Replication management
  2. Consistent hashing
  3. JavaScript views
  4. Bulk documents
  5. CouchApp

This list is the result of my needs at the time of writing. Others may convince me to adjust the priorities. Better yet, others may jump in and add support for the features they need.


Camping with CouchDB

When developing a new system, getting end-to-end functionality and being able to demonstrate it as soon as possible is important. While doing so, it’s also an added benefit if you do not spend a lot of time writing throwaway code.

I have a set of scripts that let me test and use the system that I am developing from the command line. Since the whole system is written in Ruby, writing a script to allow command line interaction is straightforward. The end result of what I am developing will be a service, so functionality equivalent to the scripts must be available in a browser. With Ruby, there are several ways to bring an application to a browser. The usual suspects are Rails, Merb and if they don’t work, one can always revert to using WEBrick and write servlets.

Rails and Merb did not seem right since I have an existing CouchDB backed model. I had read about Campingand wanted to test it out so I did. This was great. From the time I started investigating Camping until I could show data from my model in the browser took only 45 minutes. When I realized that I could use my own model directly rather than use the Camping model, I was just a few lines from the goal. I haven’t given much thought to whether this is a production ready framework, but that doesn’t really matter at the moment since I don’t have to write much unnecessary code. The benefit of reduced development time while retaining the full programmatic control of my model.

To show how easy it is to connect CouchDB and camping, here’s a simple example that while not particularly useful on it’s own should show a pattern that you can use in your own application.

require 'camping'
require 'couchrest'

Camping.goes :MyCamp

I use CouchRest to simplify CouchDB interaction. The magic is of course in the last line, Camping.goes :MyCamp. That line tells Camping to serve the module called MyCamp.
Time to implement the controller. Note that Camping expects to find the controller definitions in MyCamp::Controllers.

module MyCamp::Controllers
  class MyObject < R '/object/(\w+)'

This construct might confuse people, but R is defined by Camping and the parameter is a path with a regexp in the parentheses. This regexp yields the argument to get below. In this case, any HTTP GET request to server/object/number1 will call get('number1'). Perfectly RESTful.

    def MyObject.set_storage(storage)
      @@storage = storage
    end

This is a way of letting the controller know the about our CouchRest model. I could have added a Model encapsulating that, but to me that is just adding another level of indirection that only serves to add confusion and complicate code maintenance since the model already exists.

    def get(id)
      @my_object = @@storage.get(id)
      @my_id = id
      render :mymodel
    end
  end
end

This is the method that is called by a HTTP GET request matching the pattern given /object/(\w+). render :mymodel result in the execution of the mymodel view.

module MyCamp::Views
  def mymodel
    body do
      h1 "#{@my_id}"
      ul do
        @my_object['items'].each do |field, value|
          li "#{field}: #{value}"
        end
      end
    end
  end
end

Simple view that iterates over the ‘items’ hash and lists field: value. Note that Camping by default uses markaby to create HTML programmatically.

db_url = 'http://localhost:5984/'
storage = CouchRest.database("#{@db_url}objects")
MyCamp::Controllers::MyObject.set_storage(storage)

Sets up CouchRest to use the CouchDB backend at http://localhost:5984/objects

Run you application with

camping my_camp.rb

If the database contains a document with _id = 'number1' and 'items' = {"a": 1, "b": 2}, your browser will show this at http://localhost:3301/object/number1:

number1

  • a: 1
  • b: 2

There you go, a nice little Camping application backed by CouchDB.

For the official Camping site, go to http://camping.rubyforge.org/


Follow

Get every new post delivered to your Inbox.