cu3w0rx. Lovely name, eh? Moving on …
In this series we will be looking to implement a simple Map-Reduce framework that closely models the design and implementation of the CloudCrowd, which is written in Ruby. CloudCrowd has some nice design choices. Specifically its small size (~1,800 LOC), use of JSON for message transport, and emphasis on HTTP as the protocol is quite nice. It is also pretty to look at, and the interface is entirely AJAX driven, so relies on the same service calls as the rest of the suite. I think it is worth our while to set the stage for the project. Here are a list of things we will be taking directly from CloudCrowd:
- The App Engine application will serve as the central master resource.
- It will serve as the master work queue. All jobs will be submitted to it for processing.
- All communication will be via JSON messages. As in CC, the web site will make use of the JSON returned from AJAX requests to the resource handlers.
- There will be a clear specification for the top-level properties of a Job message, but handlers will be responsible for vetting the provided options to handle the request.
- Operations on inputs are assumed to happen on local disk. E.g. jobs will be staged onto the worker nodes’ scratch space.
- Jobs will have the option of a callback URL on success.
- For simplicity’s sake, authentication to the master, and between master and slaves, will be the same basic HTTP authentication credentials.
- A worker node will accept work item requests based on the machine’s load.
There will be several points that will stray from CloudCrowds implementation as well:
- Nodes will publish their capabilities to the master. It will not be assumed that all nodes have all the same capabilities. In this respect
cu3w0rxwill more resemble the nanite project (Ruby + ERLang). - Worker nodes will not share the same code base as the master. The major reason for this is that worker nodes will not run on GAE, hence it makes no sense to hamstring them with the restrains that GAE imposes on python.
- Workers will maintain their own state in a local database. This will server to keep track of capabilities, number of jobs processed, monitoring statistics, and results from previous jobs.
- Map and reduce are implemented as two separate jobs, as far as the worker nodes are concerned.
- Since this is a demonstration project, we will not implement a scheme to save result files to non-volitile storage. Instead we will provide a way to give authenticated access to result files from worker nodes. Result files from Reduce phases will live only at the final destination host (e.g. the hosts that have checked back into the master as having succeeded a particular job.
- The queue is not necessarily FIFO. I am actually not sure if this is also true about CloudCrowd, but its worth mentioning here.
I would also like to implement a way to provision and configure cloud VMs using the excellent libcloud library, but I think that is outside the scope of a demonstration project. If you see anything missing (or think some stuff can be left out) leave a comment!

Recent Comments