Feature prioritization for Pillow the CouchDB shard manager

I have now reached the end of my todo list for Pillow. That doesn’t mean it’s finished and ready to be stamped version 1.0. In it’s current incarnation it is fully usable and production ready, but in order to earn a 1.0 it needs to do a bit more.

The current resharding always doubles the number of servers required. Since you may overshard, that doesn’t necessarily mean you have to double the number of physical servers, but you need to organize more CouchDB instances than you might otherwise need. Smoother sharding algorithms that enable addition of single additional servers exist (consistent hashing) so Pillow should support this.

Pillow currently only supports rereducers written in Erlang. It would really be nice to support JavaScript for rereducers. A summing rereducer exists and mappers without reducers works just like in CouchDB. However when you have more complex reduction needs, copying the reducer code from your CouchDB into Pillow beats writing (and maintaining) them again in a new language.

Pillow should really support the bulk document API of CouchDB. I haven’t used this one myself, but adding support should be pretty straightforward.

CouchApp support is harder since it requires JavaScript support and then some. I probably need to play around with a CouchApp or two to find out more, but since I haven’t done so, it’s hard to determine how much work it would take.

While I do hope that there are no non-replicated CouchDB servers in production out there, reality is that there probably are lots. I like the three-way replication minimum myself and with CouchDB’s master-master scheme, it works really well. Pillow however is currently happily ignorant of any replication you have set up. I would really like to have Pillow manage such replication. In addition to managing replication, sets of Pillow servers should be controllable from a random server in the same master-master way ensuring full control of your cluster from any single Pillow node.

There is no clear prioritized list right now, all features listed above (and probably more) would be beneficial. However, as I am currently the only one developing Pillow and the time I can spend on Pillow is limited, I have to prioritize. The five features can be grouped:

  • CouchDB API compatibility: JavaScript views, bulk documents, CouchApp
  • Production flexibility and scaling: Consistent Hashing and Replication management

It is not hard to admit that API compatibility is important, but the core of the API is supported. Production flexibility and scaling is more important for me at the moment and I will probably focus on that. I also think that replication management is slightly more useful than consistent hashing. Choosing between the API features is harder since I don’t need them myself, but JavaScript views is a prerequisite of CouchApp and bulk document support is straightforward in comparison to CouchApp leading to this priority list:

  1. Replication management
  2. Consistent hashing
  3. JavaScript views
  4. Bulk documents
  5. CouchApp

This list is the result of my needs at the time of writing. Others may convince me to adjust the priorities. Better yet, others may jump in and add support for the features they need.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.