Feature prioritization for Pillow the CouchDB shard manager
Posted: July 14, 2010 Filed under: couchdb | Tags: couchdb, scaling, Storage Leave a comment »I have now reached the end of my todo list for Pillow. That doesn’t mean it’s finished and ready to be stamped version 1.0. In it’s current incarnation it is fully usable and production ready, but in order to earn a 1.0 it needs to do a bit more.
The current resharding always doubles the number of servers required. Since you may overshard, that doesn’t necessarily mean you have to double the number of physical servers, but you need to organize more CouchDB instances than you might otherwise need. Smoother sharding algorithms that enable addition of single additional servers exist (consistent hashing) so Pillow should support this.
Pillow currently only supports rereducers written in Erlang. It would really be nice to support JavaScript for rereducers. A summing rereducer exists and mappers without reducers works just like in CouchDB. However when you have more complex reduction needs, copying the reducer code from your CouchDB into Pillow beats writing (and maintaining) them again in a new language.
Pillow should really support the bulk document API of CouchDB. I haven’t used this one myself, but adding support should be pretty straightforward.
CouchApp support is harder since it requires JavaScript support and then some. I probably need to play around with a CouchApp or two to find out more, but since I haven’t done so, it’s hard to determine how much work it would take.
While I do hope that there are no non-replicated CouchDB servers in production out there, reality is that there probably are lots. I like the three-way replication minimum myself and with CouchDB’s master-master scheme, it works really well. Pillow however is currently happily ignorant of any replication you have set up. I would really like to have Pillow manage such replication. In addition to managing replication, sets of Pillow servers should be controllable from a random server in the same master-master way ensuring full control of your cluster from any single Pillow node.
There is no clear prioritized list right now, all features listed above (and probably more) would be beneficial. However, as I am currently the only one developing Pillow and the time I can spend on Pillow is limited, I have to prioritize. The five features can be grouped:
- CouchDB API compatibility: JavaScript views, bulk documents, CouchApp
- Production flexibility and scaling: Consistent Hashing and Replication management
It is not hard to admit that API compatibility is important, but the core of the API is supported. Production flexibility and scaling is more important for me at the moment and I will probably focus on that. I also think that replication management is slightly more useful than consistent hashing. Choosing between the API features is harder since I don’t need them myself, but JavaScript views is a prerequisite of CouchApp and bulk document support is straightforward in comparison to CouchApp leading to this priority list:
- Replication management
- Consistent hashing
- JavaScript views
- Bulk documents
- CouchApp
This list is the result of my needs at the time of writing. Others may convince me to adjust the priorities. Better yet, others may jump in and add support for the features they need.
CouchDB to the rescue
Posted: February 17, 2009 Filed under: Uncategorized | Tags: Storage Leave a comment »Got CouchDB installed on my Fedora box. This thing is sweet. Working with a RESTful JSON/HTTP storage system is so much easier than old-fashioned SQL databases. If I were to store users and a lists of stuff per users where this stuff could be shared among more users in a realtional db, I would create a table for the users indexed on userid, a table for the stuff indexed by stuffid and a table of userid to stuffid relations. Then of course I would need lots of boilerplate code to work with the database.
In CouchDB, I would have a database of users where a user doc is stored under /users/<userid>. The stuff would be stored as documents under /stuff/<stuffid> and the relations would be stored either in the user document or in a separate database /userstuff/<userid>/. An important difference is that no matter if the relation information was stored in the user database or in a separate database, the document stored at that location would have to be replaced whenever stuff is added or removed for a user. This makes me prefer putting the relations in a separate database rather than keep updating the user document.
It was hard to believe that you could get a simpler interface than pure HTTP, but still I had to test CouchRest by Chris Anderson. This made working with CouchDB even easier. With all this stuff in blaze, all development is a breeze since I don’t have to spend time on the nitty-gritty repetitive low-level boilerplate stuff.