I didn't want to go down the same route, because my idea for the MapReduce library is that it should be lightweight and easy to deploy. I don't want a dependency of complex configuration across multiple machines, and I want it easy to use on all platforms, including Windows, without a dependent software stack and configuration overhead.
My solution to this is an Embedded DFS so the DFS infrastructure is bound into the client application and runs without configuration. I want to be able to build a MapReduce program and run it on any number of machines in a network and it will "just work". No configuration, no messing, the subsystem takes care of it all.
Will this bloat client applications? No. The subsystems for MapReduce and DFS are very small, so the footprint overhead is minimal.
Early prototypes have proved the concept, and I can run multiple instances on multiple machines and they all find each other, communicate with each other and cope when one of more are unavailable, either by being shutdown cleanly or with a forced close.
0 comments:
Post a Comment