<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4377882162568277664</id><updated>2011-12-01T12:57:21.387Z</updated><category term='defect management'/><category term='articles'/><category term='crash'/><category term='parallel programming'/><category term='New Home'/><category term='personal'/><category term='boost c++'/><category term='MapReduce'/><category term='debugging'/><category term='Distributed File System'/><category term='book'/><category term='Google'/><category term='library'/><category term='software development'/><category term='C++'/><category term='criticism'/><category term='websites'/><category term='Stonebraker'/><category term='inbox'/><category term='compilation'/><category term='user interface'/><category term='software engineering'/><category term='Hadoop'/><category term='concept'/><category term='source code'/><category term='DeWitt'/><category term='email'/><category term='code'/><category term='programming languages'/><category term='Boost'/><title type='text'>software thoughts</title><subtitle type='html'>musings on software technology</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>29</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-3780825995838014988</id><published>2011-11-30T15:04:00.000Z</published><updated>2011-11-30T22:13:23.413Z</updated><title type='text'>Automating C++ Unit Tests in MSVC</title><content type='html'>Test Driven Development using C++ can be more challenging than in other more trendy languages. But, with a bit of creativity and know-how, it really isn't too difficult.&lt;br /&gt;&lt;br /&gt;In a recent project I wanted to follow a more formal TDD process than perhaps sometimes I would. I always write defensive code littered with assertions, etc., but always stop short of a full unit test suite for all my code.&lt;br /&gt;&lt;br /&gt;Being such a fan of the Boost libraries, naturally I turned to &lt;a href="http://boost.org/"&gt;boost.org&lt;/a&gt; and the Boost.Test library. It's very complete if a bit complex in its naming conventions, but I won't cover the library here, you can read the documentation for yourself &lt;a href="http://www.boost.org/doc/libs/1_48_0/libs/test/doc/html/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Let's talk about how to configure an MSVC solution for automated unit testing. It turns out not to be very difficult, but did take some trial-and-error.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Goals&lt;/h3&gt;With an automated unit test suite, I am trying to achieve a set of tests that satisfy these goals:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;the tests should run as a part of the build process&lt;/li&gt;&lt;li&gt;if the application fails to build, don't run the tests&lt;/li&gt;&lt;li&gt;building the project, the tests should be re-run even if the application hasn't changed&lt;/li&gt;&lt;li&gt;the results of the unit test should be available in the Output Window with the compiler output&lt;/li&gt;&lt;li&gt;if any of the tests fail, the build should fail&lt;/li&gt;&lt;li&gt;if any of the tests fail, the failing line should be easily accessible in the IDE&lt;/li&gt;&lt;/ol&gt;&lt;h3&gt;Method&lt;/h3&gt;I have two projects in my Solution. The first (Project1) is the application I'm developing (in a complex app, there'll obviously be more projects for DLLs, etc.), the second (Project2) is the unit test application.&lt;b&gt; (Goal 1)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In the Project Dependencies, set Project2 to be dependent on Project1. This will ensure the build order so the application builds before the unit tests.&lt;b&gt; (Goal 2)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;We'll use a file on disk to control the running of the tests, using a file-dependency mechanism. Before running the tests, we'll delete the file...&lt;br /&gt;In Project Properties, Build Events, Pre-Build Event, set Command Line to&lt;br /&gt;&lt;pre&gt;del "$(SolutionDir)$(Configuration)\$(ProjectName).txt"&lt;br /&gt;&lt;/pre&gt;Now, we'll use a Custom Build Step to invoke the test execution after the test suite has compiled.&lt;br /&gt;&lt;br /&gt;In Project Properties, Custom Build Step, set Command Line to (all on one line)&lt;br /&gt;&lt;pre&gt;"$(SolutionDir)$(Configuration)\$(ProjectName).exe" &amp;amp;&amp;amp;&lt;br /&gt;echo Ok &amp;gt; $(SolutionDir)$(Configuration)\$(ProjectName).txt"&lt;br /&gt;&lt;/pre&gt;This trick will run the freshly built test suite application and, if it succeeds, will write "Ok" to a text file. Now, we have to tell MSVC where the file is that we've written to:&lt;br /&gt;Set Outputs to&lt;br /&gt;&lt;pre&gt;$(SolutionDir)$(Configuration)\$(ProjectName).txt&lt;br /&gt;&lt;/pre&gt;MSVC will use this file dependency to control whether the test suite succeeded or not. On success, the file contains "Ok", but if the test suite fails, the file will not be written and won't exist because of the delete in the pre-build step.&lt;b&gt;(Goal 3)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Unit Tests&lt;/h3&gt;Using Boost.Test a very simple example test suite is shown below. Output of the running tests is send to stdout, which will be captured by MSVC and written to the Output Window. &lt;b&gt;(Goal 4)&lt;/b&gt; If any tests fail, Boost.Test returns ERROR_FAILURE which will stop the build. &lt;b&gt;(Goal 5)&lt;/b&gt;&lt;br /&gt;&lt;pre&gt;#define BOOST_TEST_MODULE fileparts&lt;br /&gt;#include "boost/test/included/unit_test.hpp"&lt;br /&gt;&lt;br /&gt;BOOST_AUTO_TEST_CASE( free_test_function )&lt;br /&gt;{&lt;br /&gt;    BOOST_CHECK(false);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;This test suite will always fail, producing the output below:&lt;br /&gt;&lt;pre&gt;1&amp;gt;CustomBuildStep:&lt;br /&gt;1&amp;gt;  Description: Performing Custom Build Step&lt;br /&gt;1&amp;gt;  Running 1 test case...&lt;br /&gt;1&amp;gt;  testsuite.cpp(13): error in "free_test_function":check false failed&lt;br /&gt;1&amp;gt;  &lt;br /&gt;1&amp;gt;  *** 1 failure detected in test suite "fileparts"&lt;br /&gt;&lt;/pre&gt;The error is in the correct format to enable you to double-click this line in the Output Window. &lt;b&gt;(Goal 6)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;This is a neat implementation with a few tricks and hacks that make a streamlines Unit Test environment for TDD in C++.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-3780825995838014988?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/3780825995838014988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2011/11/automating-c-unit-tests-in-msvc.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/3780825995838014988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/3780825995838014988'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2011/11/automating-c-unit-tests-in-msvc.html' title='Automating C++ Unit Tests in MSVC'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6107602255562703138</id><published>2010-12-03T14:30:00.002Z</published><updated>2011-10-17T15:25:12.987+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='inbox'/><category scheme='http://www.blogger.com/atom/ns#' term='email'/><category scheme='http://www.blogger.com/atom/ns#' term='concept'/><category scheme='http://www.blogger.com/atom/ns#' term='websites'/><category scheme='http://www.blogger.com/atom/ns#' term='user interface'/><title type='text'>Email Inbox Concept</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;I have created a new and, I hope, innovative user interface for a &lt;a href="http://craighenderson.co.uk/inboxconcept/"&gt;web based email reader&lt;/a&gt;, and&amp;nbsp;&lt;/span&gt;I would be very interested to hear your views. Please leave your comments on this blog.&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="color: blue; font-family: inherit;"&gt;&lt;b&gt;What it so different?&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;The &lt;i&gt;inbox concept&lt;/i&gt; is a web based email reader with a clean, crisp user interface and smooth animated navigation. Emails are organised into pages, and sorted by date. Ordering is slightly looser than strictly by date to take optimise the use of the space that is available.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;&lt;b&gt;Features&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The size and shape of each email message changes dynamically based on attributes of the message, such as the length of the subject text, the number of messages in group &amp;amp; whether image attachments are present&lt;/li&gt;&lt;li&gt;Smooth animated navigation&lt;/li&gt;&lt;li&gt;Attached images are previewed in the main inbox view&lt;/li&gt;&lt;li&gt;Message groups; collation of messages with the same subject from multiple recipient helps to manage your inbox&lt;/li&gt;&lt;li&gt;Multiple messages (conversations) are grouped and shown in &lt;a href="http://docs.jquery.com/UI/Accordion"&gt;an accordion&lt;/a&gt; user interface.&lt;/li&gt;&lt;li&gt;Unread emails are shown with blue text&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;&lt;b&gt;Try it&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;To try the inbox concept, follow &lt;a href="http://craighenderson.co.uk/inboxconcept/"&gt;this link&lt;/a&gt; and read the emails in the inbox. You can send emails to &lt;a href="mailto:inboxconcept@gmail.com"&gt;inboxconcept@gmail.com&lt;/a&gt; and see them appear in the inbox.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: blue; font-family: inherit;"&gt;&lt;b&gt;Limitations&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;This is a proof of concept, and not a fully functional email application. The Email Inbox Concept is read-only, and does not allow for sending, reply to, or forwarding emails. Emails are not marked as 'read' after they are opened in the UI.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;The concept is restricted to the demonstration inbox and users can connect to their personal email from this application.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: blue; font-family: inherit;"&gt;&lt;b&gt;Disclaimer&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;The inbox concept is open to the entire world wide web. Any email messages sent to inboxconcept@gmail.com will appear on the site, and may therefore be indexed by and appear in search engine results from Google, Yahoo!, Bing, and others. Please don't send emails to the account unless you are happy that the content is freely accessible.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6107602255562703138?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6107602255562703138/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/12/email-inbox-concept.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6107602255562703138'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6107602255562703138'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/12/email-inbox-concept.html' title='Email Inbox Concept'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-2594512178077652563</id><published>2010-10-11T11:27:00.000+01:00</published><updated>2010-10-11T11:27:13.415+01:00</updated><title type='text'>MapReduce with an embedded distributed file system</title><content type='html'>When I wrote my &lt;a href="http://craighenderson.co.uk/mapreduce"&gt;C++ MapReduce library&lt;/a&gt;, I had quite a bit of interest from potential users. One of the most frequent questions that came up was about scaling across multiple machines. To do this, there is a requirement for a lot of infrastructure to manage the execution of &lt;a href="http://craighenderson.co.uk/papers/software_scalability_mapreduce"&gt;MapReduce&lt;/a&gt; Jobs, merge and sort intermediate results, manage cross-machine communication and provide resilience in the case of machine failures. One of the biggest components is the distribution of data files to be processed and consolidation of result data files across the network. This is typically done with a subsystem called the Distributed File System. Google's original MapReduce algorithm used GFS, the Google File System. Hadoop has the HDF, Hadoop File System.&lt;br /&gt;&lt;br /&gt;I didn't want to go down the same route, because my idea for the MapReduce library is that it should be lightweight and easy to deploy. I don't want a dependency of complex configuration across multiple machines, and I want it easy to use on all platforms, including Windows, without a dependent software stack and configuration overhead.&lt;br /&gt;&lt;br /&gt;My solution to this is an Embedded DFS so the DFS infrastructure is bound into the client application and runs without configuration. I want to be able to build a MapReduce program and run it on any number of machines in a network and it will "just work". No configuration, no messing, the subsystem takes care of it all.&lt;br /&gt;&lt;br /&gt;Will this bloat client applications? No. The subsystems for MapReduce and DFS are very small, so the footprint overhead is minimal.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Early prototypes have proved the concept, and I can run multiple instances on multiple machines and they all find each other, communicate with each other and cope when one of more are unavailable, either by being shutdown cleanly or with a forced close.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-2594512178077652563?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/2594512178077652563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/10/mapreduce-with-embedded-distributed.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/2594512178077652563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/2594512178077652563'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/10/mapreduce-with-embedded-distributed.html' title='MapReduce with an embedded distributed file system'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5145207172605315298</id><published>2010-04-27T19:18:00.001+01:00</published><updated>2010-04-27T19:19:36.673+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Hadoop'/><title type='text'>Google grants license for Apache Hadoop</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&lt;a href="http://www.h-online.com/open/news/item/Google-grants-license-for-Apache-Hadoop-987722.html"&gt;The H&lt;/a&gt; reports that Google has&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&lt;a href="http://mail-archives.apache.org/mod_mbox/hadoop-general/201004.mbox/%3C121803A3-CFB9-489B-96EF-027234E55D25@apache.org%3E" rel="external" style="color: #666666; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" target="_blank"&gt;granted a license&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;for one of its patents to the&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&lt;a href="http://hadoop.apache.org/" rel="external" style="color: #666666; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" target="_blank"&gt;Apache Hadoop&lt;/a&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Helvetica, sans-serif; font-size: 14px; line-height: 17px;"&gt;open source framework for distributed computing.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5145207172605315298?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5145207172605315298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/google-grants-license-for-apache-hadoop.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5145207172605315298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5145207172605315298'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/google-grants-license-for-apache-hadoop.html' title='Google grants license for Apache Hadoop'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-8354855794947048006</id><published>2010-04-14T19:38:00.000+01:00</published><updated>2010-04-14T19:38:03.359+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='parallel programming'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='software engineering'/><title type='text'>Software Scalability with MapReduce article published</title><content type='html'>I have published a new article about &lt;a href="http://craighenderson.co.uk/papers/software_scalability_mapreduce/"&gt;Software Scalability with MapReduce&lt;/a&gt; on my website.&lt;br /&gt;&lt;br /&gt;This substantial article introduces some of the concepts of MapReduce and how they can be applied to scaling software even on a single machine. You don't need datacenters and large clusters to be able to take advantage of MapReduce.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-8354855794947048006?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/8354855794947048006/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/software-scalability-with-mapreduce.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/8354855794947048006'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/8354855794947048006'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/software-scalability-with-mapreduce.html' title='Software Scalability with MapReduce article published'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-4104630787896317935</id><published>2010-04-14T09:00:00.001+01:00</published><updated>2010-04-14T09:10:53.140+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='book'/><title type='text'>Book Review - Computing for Numerical Methods using Visual C++</title><content type='html'>I have &lt;a href="http://craighenderson.co.uk/papers/computing_for_numerical_methods/"&gt;published online&lt;/a&gt; my book review of Computing for Numerical Methods using Visual C++ by Shaharuddin Salleh, Albert Y. Zomaya and Sakhinah Abu Bakar. I wrote this review in August 2009 for the &lt;a href="http://www.sigsoft.org/" target="_blank"&gt;ACM Special Interest Group on Software Engineering&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-4104630787896317935?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/4104630787896317935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/book-review-computing-for-numerical.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/4104630787896317935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/4104630787896317935'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/04/book-review-computing-for-numerical.html' title='Book Review - Computing for Numerical Methods using Visual C++'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-1216082955048218915</id><published>2010-03-27T15:56:00.000Z</published><updated>2010-03-27T15:56:31.496Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='book'/><title type='text'>MapReduce book available online (in draft)</title><content type='html'>An interesting book on MapReduce has been made available in draft form, called &lt;a href="http://www.umiacs.umd.edu/~jimmylin/book.html"&gt;Data-Intensive Text Processing with MapReduce&lt;/a&gt;. I haven't read the whole thing in detail, but I like the author's writing style from the parts I have read, and look forward to reading the rest.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-1216082955048218915?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/1216082955048218915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/03/mapreduce-book-available-online-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/1216082955048218915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/1216082955048218915'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/03/mapreduce-book-available-online-in.html' title='MapReduce book available online (in draft)'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-4045852668149452313</id><published>2010-01-20T14:33:00.000Z</published><updated>2010-01-20T14:33:36.075Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>MapReduce patent granted to Google</title><content type='html'>Google has be granted a patent for MapReduce. Patent &lt;a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PALL&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&amp;r=1&amp;f=G&amp;l=50&amp;s1=7,650,331.PN.&amp;OS=PN/7,650,331&amp;RS=PN/7,650,331"&gt;7650331&lt;/a&gt; entitled "System and method for efficient large-scale data processing" was filed in June 2004, and granted 19th January 2010.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-4045852668149452313?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/4045852668149452313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2010/01/mapreduce-patent-granted-to-google.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/4045852668149452313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/4045852668149452313'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2010/01/mapreduce-patent-granted-to-google.html' title='MapReduce patent granted to Google'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-586306573777413228</id><published>2009-11-28T12:08:00.000Z</published><updated>2009-11-28T12:08:28.521Z</updated><title type='text'>Wordle of this Blog</title><content type='html'>I love &lt;a href="http://www.wordle.net" target="_blank"&gt;Wordle&lt;/a&gt;. Here's one of this blog.&lt;br/&gt;&lt;br /&gt;&lt;applet name="wordle" mayscript="mayscript" codebase="http://wordle.appspot.com" code="wordle.WordleApplet.class" archive="/j/v1246/wordle.jar" width="400" height="300"&gt;&lt;param name="saved" value="wrdl/1388576/Craig%27s_Blog"/&gt;&lt;param name="font" value="Kenyan Coffee"/&gt;&lt;param name="background" value="0x7c3c18"/&gt;Your browser does not seem to understand the APPLET tag. You need to install and enable the &lt;a href="http://java.com/"&gt;Java&lt;/a&gt; plugin. &lt;/applet&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-586306573777413228?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/586306573777413228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/wordle-of-this-blog.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/586306573777413228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/586306573777413228'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/wordle-of-this-blog.html' title='Wordle of this Blog'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6748089696085573760</id><published>2009-11-20T16:14:00.002Z</published><updated>2010-03-27T20:23:32.523Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='boost c++'/><title type='text'>Boost 1.41.0 Released</title><content type='html'>Version 1.41.0 of Boost C++ libraries has been released. See &lt;a href="http://www.boost.org/users/news/version_1_41_0"&gt;the Boost web site&lt;/a&gt; for the release notes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6748089696085573760?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6748089696085573760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/boost-1410-released.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6748089696085573760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6748089696085573760'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/boost-1410-released.html' title='Boost 1.41.0 Released'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5107209724282154931</id><published>2009-11-05T19:52:00.001Z</published><updated>2009-11-05T19:52:26.680Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='software engineering'/><title type='text'>Software Engineering: An Idea Whose Time Has Come and Gone?</title><content type='html'>Tom DeMarco updates his views on Software Engineering metrics, 27 years on from his 1987 book "Controlling Software Projects: Management, Measurement, and Estimation"&lt;br /&gt;&lt;br /&gt;&lt;a target="_blank" href="http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2009/0709/rW_SO_Viewpoints.pdf"&gt;http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2009/0709/rW_SO_Viewpoints.pdf&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5107209724282154931?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5107209724282154931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/software-engineering-idea-whose-time.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5107209724282154931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5107209724282154931'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/software-engineering-idea-whose-time.html' title='Software Engineering: An Idea Whose Time Has Come and Gone?'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-2872686103170722933</id><published>2009-11-01T14:31:00.003Z</published><updated>2009-11-01T21:15:19.426Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='DeWitt'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='criticism'/><category scheme='http://www.blogger.com/atom/ns#' term='Stonebraker'/><title type='text'>DeWitt and Stonebraker's "MapReduce: A major step backwards"</title><content type='html'>In January 2008 David DeWitt and Michael Stonebraker&amp;nbsp;published an article criticising MapReduce. As far as I can find, the article is no longer available from the original authors. I did find what I believe to be an accurate re-publication at&amp;nbsp;&lt;a href="http://www.yjanboo.cn/?p=237"&gt;http://www.yjanboo.cn/?p=237&lt;/a&gt;, and I reproduce it below from this source, unedited.&lt;br /&gt;The views expressed are those of the original authors, and not my own. I do not wish to be drawn into the debate, and therefore pass no comment, but as a MapReduce researcher and author, I want to provide a long-term reference to the original article that is unfortunately not available less than two years after publication.&lt;br /&gt;&lt;blockquote&gt;&lt;div class="MsoNormal" style="margin-bottom: 3.85pt; margin-left: 0in; margin-right: 0in; margin-top: .05in; mso-line-height-alt: 9.6pt; mso-outline-level: 1;"&gt;&lt;b&gt;&lt;span style="color: #333333; font-family: Tahoma, sans-serif;"&gt;MapReduce: A major step backwards&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;"&gt;&lt;span lang="EN-GB" style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt; line-height: 115%;"&gt;David J. DeWitt and Michael Stonebraker&lt;/span&gt;&lt;span lang="EN-GB" style="font-size: 16pt; line-height: 115%;"&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span lang="EN-GB" style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt; line-height: 115%;"&gt;&lt;a href="http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html"&gt;http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html&lt;/a&gt; (no longer available). Online 1/11/09 at &lt;a href="http://www.yjanboo.cn/?p=237"&gt;http://www.yjanboo.cn/?p=237&lt;/a&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;On January 8, a Database Column reader asked for our views on new distributed database research efforts, and we’ll begin here with our views on MapReduce. This is a good time to discuss it, since the recent trade press has been filled with news of the revolution of so-called “cloud computing.” This paradigm entails harnessing large numbers of (low-end) processors working in parallel to solve a computing problem. In effect, this suggests constructing a data center by lining up a large number of “jelly beans” rather than utilizing a much smaller number of high-end servers.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;For example, IBM and Google have announced plans to make a 1,000 processor cluster available to a few select universities to teach students how to program such clusters using a software tool called MapReduce [1]. Berkeley has gone so far as to plan on teaching their freshman how to program using the MapReduce framework.&lt;br /&gt;As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;A giant step backward in the programming paradigm for large-scale data intensive applications&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;A sub-optimal implementation, in that it uses brute force instead of indexing&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Missing most of the features that are routinely included in current DBMS&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Incompatible with all of the tools DBMS users have come to depend on&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;First, we will briefly discuss what MapReduce is; then we will go into more detail about our five reactions listed above.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;What is MapReduce?&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The basic idea of MapReduce is straightforward. It consists of two programs that the user writes called map and reduce plus a framework for executing a possibly large number of instances of each program on a compute cluster.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The map program reads a set of “records” from an input file, does any desired filtering and/or transformations, and then outputs a set of records of the form (key, data). As the map program produces output records, a “split” function partitions the records into M disjoint buckets by applying a function to the key of each output record. This split function is typically a hash function, though any deterministic function will suffice. When a bucket fills, it is written to disk. The map program terminates with M output files, one for each bucket.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;In general, there are multiple instances of the map program running on different nodes of a compute cluster. Each map instance is given a distinct portion of the input file by the MapReduce scheduler to process. If N nodes participate in the map phase, then there are M files on disk storage at each of N nodes, for a total of N * M files; Fi,j, 1 ≤ i ≤ N, 1 ≤ j ≤ M.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The key thing to observe is that all map instances use the same hash function. Hence, all output records with the same hash value will be in corresponding output files.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The second phase of a MapReduce job executes M instances of the reduce program, Rj, 1 ≤ j ≤ M. The input for each reduce instance Rj consists of the files Fi,j, 1 ≤ i ≤ N. Again notice that all output records from the map phase with the same hash value will be consumed by the same reduce instance — no matter which map instance produced them. After being collected by the map-reduce framework, the input records to a reduce instance are grouped on their keys (by sorting or hashing) and feed to the reduce program. Like the map program, the reduce program is an arbitrary computation in a general-purpose language. Hence, it can do anything it wants with its records. For example, it might compute some additional function over other data fields in the record. Each reduce instance can write records to an output file, which forms part of the “answer” to a MapReduce computation.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;To draw an analogy to SQL, map is like the group-by clause of an aggregate query. Reduce is analogous to the aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;We now turn to the five concerns we have with this computing paradigm.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;1. MapReduce is a step backwards in database access&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;As a data processing paradigm, MapReduce represents a giant step backwards. The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Schemas are good.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Separation of the schema from the application is good.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;High-level access languages are good.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;MapReduce has learned none of these lessons and represents a throw back to the 1960s, before modern DBMSs were invented.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The DBMS community learned the importance of schemas, whereby the fields and their data types are recorded in storage. More importantly, the run-time system of the DBMS can ensure that input records obey this schema. This is the best way to keep an application from adding “garbage” to a data set. MapReduce has no such functionality, and there are no controls to keep garbage out of its data sets. A corrupted MapReduce dataset can actually silently break all the MapReduce applications that use that dataset.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;It is also crucial to separate the schema from the application program. If a programmer wants to write a new application against a data set, he or she must discover the record structure. In modern DBMSs, the schema is stored in a collection of system catalogs and can be queried (in SQL) by any user to uncover such structure. In contrast, when the schema does not exist or is buried in an application program, the programmer must discover the structure by an examination of the code. Not only is this a very tedious exercise, but also the programmer must find the source code for the application. This latter tedium is forced onto every MapReduce programmer, since there are no system catalogs recording the structure of records — if any such structure exists.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;During the 1970s the DBMS community engaged in a “great debate” between the relational advocates and the Codasyl advocates. One of the key issues was whether a DBMS access program should be written:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;By stating what you want - rather than presenting an algorithm for how to get it (relational view)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;By presenting an algorithm for data access (Codasyl view)&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The result is now ancient history, but the entire world saw the value of high-level languages and relational systems prevailed. Programs in high-level languages are easier to write, easier to modify, and easier for a new person to understand. Codasyl was rightly criticized for being “the assembly language of DBMS access.” A MapReduce programmer is analogous to a Codasyl programmer — he or she is writing in a low-level language performing low-level record manipulation. Nobody advocates returning to assembly language; similarly nobody should be forced to program in MapReduce.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;MapReduce advocates might counter this argument by claiming that the datasets they are targeting have no schema. We dismiss this assertion. In extracting a key from the input data set, the map function is relying on the existence of at least one data field in each input record. The same holds for a reduce function that computes some value from the records it receives to process.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Writing MapReduce applications on top of Google’s BigTable (or Hadoop’s HBase) does not really change the situation significantly. By using a self-describing tuple format (row key, column name, {values}) different tuples within the same table can actually have different schemas. In addition, BigTable and HBase do not provide logical independence, for example with a view mechanism. Views significantly simplify keeping applications running when the logical schema changes.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;2. MapReduce is a poor implementation&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;All modern DBMSs use hash or B-tree indexes to accelerate access to data. If one is looking for a subset of the records (e.g., those employees with a salary of 10,000 or those in the shoe department), then one can often use an index to advantage to cut down the scope of the search by one to two orders of magnitude. In addition, there is a query optimizer to decide whether to use an index or perform a brute-force sequential search.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;MapReduce has no indexes and therefore has only brute force as a processing option. It will be creamed whenever an index is the better access mechanism.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;One could argue that value of MapReduce is automatically providing parallel execution on a grid of computers. This feature was explored by the DBMS research community in the 1980s, and multiple prototypes were built including Gamma [2,3], Bubba [4], and Grace [5]. Commercialization of these ideas occurred in the late 1980s with systems such as Teradata.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;In summary to this first point, there have been high-performance, commercial, grid-oriented SQL engines (with schemas and indexing) for the past 20 years. MapReduce does not fare well when compared with such systems.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;There are also some lower-level implementation issues with MapReduce, specifically skew and data interchange.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;One factor that MapReduce advocates seem to have overlooked is the issue of skew. As described in “Parallel Database System: The Future of High Performance Database Systems,” [6] skew is a huge impediment to achieving successful scale-up in parallel query systems. The problem occurs in the map phase when there is wide variance in the distribution of records with the same key. This variance, in turn, causes some reduce instances to take much longer to run than others, resulting in the execution time for the computation being the running time of the slowest reduce instance. The parallel database community has studied this problem extensively and has developed solutions that the MapReduce community might want to adopt.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;There is a second serious performance problem that gets glossed over by the MapReduce proponents. Recall that each of the N map instances produces M output files — each destined for a different reduce instance. These files are written to a disk local to the computer used to run the map instance. If N is 1,000 and M is 500, the map phase produces 500,000 local files. When the reduce phase starts, each of the 500 reduce instances needs to read its 1,000 input files and must use a protocol like FTP to “pull” each of its input files from the nodes on which the map instances were run. With 100s of reduce instances running simultaneously, it is inevitable that two or more reduce instances will attempt to read their input files from the same map node simultaneously — inducing large numbers of disk seeks and slowing the effective disk transfer rate by more than a factor of 20. This is why parallel database systems do not materialize their split files and use push (to sockets) instead of pull. Since much of the excellent fault-tolerance that MapReduce obtains depends on materializing its split files, it is not clear whether the MapReduce framework could be successfully modified to use the push paradigm instead.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Given the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale. Moreover, the MapReduce implementers would do well to study the last 25 years of parallel DBMS research literature.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;3. MapReduce is not novel&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;The MapReduce community seems to feel that they have discovered an entirely new paradigm for processing large data sets. In actuality, the techniques employed by MapReduce are more than 20 years old. The idea of partitioning a large data set into smaller partitions was first proposed in “Application of Hash to Data Base Machine and Its Architecture” [11] as the basis for a new type of join algorithm. In “Multiprocessor Hash-Based Join Algorithms,” [7], Gerber demonstrated how Kitsuregawa’s techniques could be extended to execute joins in parallel on a shared-nothing [8] cluster using a combination of partitioned tables, partitioned execution, and hash based splitting. DeWitt [2] showed how these techniques could be adopted to execute aggregates with and without group by clauses in parallel. DeWitt and Gray [6] described parallel database systems and how they process queries. Shatdal and Naughton [9] explored alternative strategies for executing aggregates in parallel.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Teradata has been selling a commercial DBMS utilizing all of these techniques for more than 20 years; exactly the techniques that the MapReduce crowd claims to have invented.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;While MapReduce advocates will undoubtedly assert that being able to write MapReduce functions is what differentiates their software from a parallel SQL implementation, we would remind them that POSTGRES supported user-defined functions and user-defined aggregates in the mid 1980s. Essentially, all modern database systems have provided such functionality for quite a while, starting with the Illustra engine around 1995.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;4. MapReduce is missing features&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;All of the following features are routinely provided by modern DBMSs, and all are missing from MapReduce:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Bulk loader — to transform input data in files into a desired format and load it into a DBMS&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Indexing — as noted above&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Updates — to change the data in the data base&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Transactions — to support parallel update and recovery from failures during update&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Integrity constraints — to help keep garbage out of the data base&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Referential integrity — again, to help keep garbage out of the data base&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Views — so the schema can change without having to rewrite the application program&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;In summary, MapReduce provides only a sliver of the functionality found in modern DBMSs.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;5. MapReduce is incompatible with the DBMS tools&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;A modern SQL DBMS has available all of the following classes of tools:&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Report writers (e.g., Crystal reports) to prepare reports for human visualization&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Business intelligence tools (e.g., Business Objects or Cognos) to enable ad-hoc querying of large data warehouses&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to allow a user to discover structure in large data sets&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Replication tools (e.g., Golden Gate) to allow a user to replicate data from on DBMS to another&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;Database design tools (e.g., Embarcadero) to assist the user in constructing a data base.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;MapReduce cannot use these tools and has none of its own. Until it becomes SQL-compatible or until someone writes all of these tools, MapReduce will remain very difficult to use in an end-to-end task.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;In Summary&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;It is exciting to see a much larger community engaged in the design and implementation of scalable query processing techniques. We, however, assert that they should not overlook the lessons of more than 40 years of database technology — in particular the many advantages that a data model, physical and logical data independence, and a declarative query language, such as SQL, bring to the design, implementation, and maintenance of application programs. Moreover, computer science communities tend to be insular and do not read the literature of other communities. We would encourage the wider community to examine the parallel DBMS literature of the last 25 years. Last, before MapReduce can measure up to modern DBMSs, there is a large collection of unmet features and required tools that must be added.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="line-height: 10.75pt; margin-bottom: .0001pt; margin-bottom: 0in; margin-left: 0in; margin-right: 0in; margin-top: .1in; mso-para-margin-bottom: .0001pt; mso-para-margin-bottom: 0in; mso-para-margin-left: 0in; mso-para-margin-right: 0in; mso-para-margin-top: .6gd;"&gt;&lt;span style="color: black; font-family: Tahoma, sans-serif; font-size: 9pt;"&gt;We fully understand that database systems are not without their problems. The database community recognizes that database systems are too “hard” to use and is working to solve this problem. The database community can also learn something valuable from the excellent fault-tolerance that MapReduce provides its applications. Finally we note that some database researchers are beginning to explore using the MapReduce framework as the basis for building scalable database systems. The Pig[10] project at Yahoo! Research is one such effort.&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;br/&gt;Sorry, I do not have the original references, and it would be wrong to attempt to re-create them. I suggest Google may help.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-2872686103170722933?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/2872686103170722933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/dewitt-and-stonebrakers-mapreduce-major.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/2872686103170722933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/2872686103170722933'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/11/dewitt-and-stonebrakers-mapreduce-major.html' title='DeWitt and Stonebraker&apos;s &quot;MapReduce: A major step backwards&quot;'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5263751225014980298</id><published>2009-09-10T21:38:00.000+01:00</published><updated>2009-09-10T22:05:57.948+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='compilation'/><title type='text'>Distributed Compilation using MapReduce</title><content type='html'>&lt;p class="MsoNormal"&gt;I've been toying with an idea of using MapReduce to distribute compilation of C++ applications. A full system build of a complex software system can take many hours to run using a traditional linear makefile. Even using a mulit-threaded build system has limitations as contention on harddisk becomes a limiting bottleneck.&lt;/p&gt;&lt;p class="MsoNormal"&gt;So, perhaps something along the lines of a MapReduce system running a series of Jobs...&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;b&gt;Job 1: Build System (makefile/MSVC sln)&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: normal; "&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;map &lt;/span&gt;(makefile name, makefile contents) --&gt; list(source filename, dependent filename)&lt;br /&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;reduce&lt;/span&gt;(null)&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;b&gt;Job 2: Compile &amp;amp; Link applications/libraries (makefile/MSVC vcproj)&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: normal; "&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;map &lt;/span&gt;(source_filename, output_filename) --&gt; list(output_filename, object_filename)&lt;br /&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;reduce&lt;/span&gt; (list(object_filename), executable/shared library/static library)&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span lang="EN-GB"&gt;&lt;b&gt;Job 3: Package&lt;br /&gt;&lt;/b&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;map&lt;/span&gt;&lt;span style="mso-spacerun:yes"&gt; &lt;/span&gt;(content filenames, package filename) --&gt; list(package filename, content filename)&lt;br /&gt;&lt;span class="Apple-style-span"  style="color:#009900;"&gt;reduce&lt;/span&gt; (list(executable/shared library filenames), package [zip, msi, etc.])&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span lang="EN-GB"&gt;&lt;o:p&gt;The compilation step using many dependent files (header files of the system, compiler headers and third-party libraries). These can be pre-loaded into the distributed file system for access if DFS can be mounted natively to the OS filesystem.&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span lang="EN-GB"&gt;&lt;o:p&gt;If the DFS cannot be mounted in the OS filesystem, then the dependent files will not be available to the compiler on the MR job runner machines. Here, we can redefine the Reduce Phase of Job 1 to run a pre-process of the source file to send single-source file for compilation (i.e. has no header file dependencies) using a compilation flag; -e for GCC, -P for MSVC.&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5263751225014980298?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5263751225014980298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/distributed-compilation-using-mapreduce.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5263751225014980298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5263751225014980298'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/distributed-compilation-using-mapreduce.html' title='Distributed Compilation using MapReduce'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-7507861092925036914</id><published>2009-09-10T17:57:00.000+01:00</published><updated>2009-09-10T18:01:30.853+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='library'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>MapReduce C++ Library v0.4 available to download</title><content type='html'>I've been unable to post for a while. For those interested and missed other announcement, v0.4 of my Mapreduce library was made available on 29th August. It is in the Boost Sandbox (subversion access), and in the Boost Vault as a zip file at &lt;a href="http://www.boostpro.com/vault/index.php?action=downloadfile&amp;amp;filename=mapreduce_0_4.zip&amp;amp;directory=&amp;amp;"&gt;http://www.boostpro.com/vault/index.php?action=downloadfile&amp;amp;filename=mapreduce_0_4.zip&amp;amp;directory=&amp;amp;&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This version is functionally complete for single-machine multithreaded MapReduce programming, and performs in comparative time with other C-based libraries such as &lt;a href="http://mapreduce.stanford.edu/"&gt;Phoenix MapReduce&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-7507861092925036914?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/7507861092925036914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/mapreduce-c-library-v04-available-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7507861092925036914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7507861092925036914'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/mapreduce-c-library-v04-available-to.html' title='MapReduce C++ Library v0.4 available to download'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5552418224106091956</id><published>2009-09-10T09:20:00.000+01:00</published><updated>2009-09-10T18:05:46.430+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='New Home'/><title type='text'>A new home for my Blog</title><content type='html'>I have had to move my Blog from my domain &lt;a href="http://www.craighenderson.co.uk/"&gt;http://www.craighenderson.co.uk&lt;/a&gt; because my hosting company are useless. I'm not going to rant about that here, but just want to explain the reason.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have painlessly imported my self-hosted WordPress blog into Blogger after reading this post: &lt;a href="http://www.tothepc.com/archives/import-convert-wordpress-blog-into-blogger-blog/"&gt;http://www.tothepc.com/archives/import-convert-wordpress-blog-into-blogger-blog/&lt;/a&gt; and following the simple detailed instructions here &lt;a href="http://wordpress2blogger.appspot.com/"&gt;http://wordpress2blogger.appspot.com/&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thanks for your posts, guys!&lt;/div&gt;&lt;div&gt;-- Craig&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5552418224106091956?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5552418224106091956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/i-have-had-to-move-my-blog-from-my.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5552418224106091956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5552418224106091956'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/09/i-have-had-to-move-my-blog-from-my.html' title='A new home for my Blog'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5906328195904401799</id><published>2009-07-21T23:42:00.000+01:00</published><updated>2009-09-10T14:05:59.367+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='library'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>MapReduce C++ Library v0.2 available to download</title><content type='html'>I've posted v0.2 of my C++ MapReduce library on the Boost Vault. It is &lt;a href="http://www.boostpro.com/vault/index.php?action=downloadfile&amp;amp;filename=mapreduce_0_2.zip&amp;amp;directory=&amp;amp;"&gt;downloadable directly from here&lt;/a&gt;. Updates in this release are:&lt;br/&gt;&lt;ul&gt;&lt;br/&gt;	&lt;li&gt;Moved the library into the boost namespace&lt;/li&gt;&lt;br/&gt;	&lt;li&gt;Created PartitionFn template parameter on intermediates::local_disk to enable customisation of the partitioning of data into result files&lt;/li&gt;&lt;br/&gt;	&lt;li&gt;Use of &lt;code&gt;BOOST_THROW_EXCEPTION&lt;/code&gt; in place of throw&lt;/li&gt;&lt;br/&gt;	&lt;li&gt;Rationalised and completed include guards&lt;/li&gt;&lt;br/&gt;	&lt;li&gt;Support for gcc 4.3.3 on Ubuntu Linux&lt;/li&gt;&lt;br/&gt;&lt;/ul&gt;&lt;br/&gt;Online documentation is &lt;a href="http://www.craighenderson.co.uk/mapreduce/"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5906328195904401799?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5906328195904401799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-c-library-v02-available-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5906328195904401799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5906328195904401799'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-c-library-v02-available-to.html' title='MapReduce C++ Library v0.2 available to download'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-3216114511537968172</id><published>2009-07-20T22:42:00.000+01:00</published><updated>2009-09-10T14:05:59.360+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='programming languages'/><title type='text'>MapReduce runtime language choices</title><content type='html'>I'm not interested in language wars or general arguments over relative merits of a particular technology choice, but a recent blog post caught my attention as it flashed up on my Google Alert email. Entitled &lt;a href="http://www.trendcaller.com/2009/07/java-performance-does-not-scale-for.html" target="_self"&gt;Yahoo's infrastructural disadvantage to Google: Java performance does not scale&lt;/a&gt;, the author (Kevin Lawton) presents some experimental results in benchmarking scalability of Java and C++.&lt;br/&gt;&lt;blockquote&gt;Yahoo(&lt;a href="http://seekingalpha.com/symbol/yhoo" target="_blank"&gt;YHOO&lt;/a&gt;) uses a Java-based MapReduce infrastructure called Hadoop. This article demonstrates why Java performance does not scale well for large scale compute settings, relative to C++, which is what Google(&lt;a href="http://seekingalpha.com/symbol/goog" target="_blank"&gt;GOOG&lt;/a&gt;) uses for their MapReduce infrastructure.&lt;/blockquote&gt;&lt;br/&gt;What I particularly liked about this post is that Kevin does not just compare languages against each other, but compares scalability of each language against itself and then against the other.&lt;br/&gt;&lt;br/&gt;We all have choices to make, and there will never be a one-size-fits-all technology. The great thing about software - and open-source in particular - is that a lot of very smart people work to provide alternative technology solutions so that we can make these choices freely to best suit our needs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-3216114511537968172?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/3216114511537968172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-runtime-language-choices.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/3216114511537968172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/3216114511537968172'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-runtime-language-choices.html' title='MapReduce runtime language choices'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-856050113461994388</id><published>2009-07-20T22:21:00.000+01:00</published><updated>2009-09-10T14:05:59.353+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><title type='text'>Online C++ MapReduce library documentation</title><content type='html'>As I posted yesterday, I have upload the first release of my C++ MapReduce library to the Boost Vault. Development and testing continues, so I have uploaded the Boost-ified documentation to my site at &lt;a href="http://www.craighenderson.co.uk/mapreduce/"&gt;http://www.craighenderson.co.uk/mapreduce/&lt;/a&gt; which I can update much more easily than the zip file in the vault.&lt;br/&gt;&lt;br/&gt;Importantly, I have included a Change Log on the front page. This includes the changes that have been made to the library since I uploaded it to the vault, and I'll keep this up-to-date with reference to updated release, and when I get the code into the Boost Sandbox (subversion).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-856050113461994388?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/856050113461994388/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/online-c-mapreduce-library.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/856050113461994388'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/856050113461994388'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/online-c-mapreduce-library.html' title='Online C++ MapReduce library documentation'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-9130748966916707906</id><published>2009-07-19T18:57:00.000+01:00</published><updated>2009-09-10T14:05:59.344+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='source code'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>MapReduce C++ library available for downnload</title><content type='html'>I have uploaded the first public release of my MapReduce library to the Boost Vault. The ZIP file is very small, 38Kb, and can be downloaded here: &lt;a href="http://tinyurl.com/boost-mapreduce-zip"&gt;http://tinyurl.com/boost-mapreduce-zip&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;The library is fully functional on Windows with MSVC8, and compiles cleanly with GCC 3.4.4 on Cygwin (but not tested). Documentation is included in the zip and provides a tutorial and example application. A full reference document of the classes in the library is not yet included, but I’m working on it.&lt;br/&gt;&lt;br/&gt;If you decide to download and use/try the library, please let me know. I'm keen to hear comments about the library design and usability.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-9130748966916707906?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/9130748966916707906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-c-library-available-for.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/9130748966916707906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/9130748966916707906'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-c-library-available-for.html' title='MapReduce C++ library available for downnload'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-7617401855195251496</id><published>2009-07-15T17:29:00.000+01:00</published><updated>2009-09-10T14:05:59.337+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>MapReduce is good for distributed processing, too</title><content type='html'>&lt;div&gt;&lt;br/&gt;&lt;br/&gt;MapReduce was devised for data-intensive applications to maximise the throughput of data by using multiple machines to process data in parts. There are other uses for this programming idiom, too. My initial &lt;a href="http://craighenderson.co.uk/blog/index.php/2009/06/17/mapreduce-in-c/" target="_self"&gt;MapReduce implementation in C++&lt;/a&gt; is designed for processor intensive algorithms, and I see a lot of value in this as a mechanism for dealing with the complexities of multi-threaded data processing in C++.&lt;br/&gt;&lt;br/&gt;I was therefore pleased to read that Yahoo! has used it's Hadoop clusters to calculate the 10&lt;sup&gt;15&lt;/sup&gt;+1st bit of pi. Read more on the Yahoo! Developer Network blog at &lt;a href="http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_computes_the_10151st_bi.html" target="_blank"&gt;http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_computes_the_10151st_bi.html&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-7617401855195251496?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/7617401855195251496/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-is-good-for-distributed.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7617401855195251496'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7617401855195251496'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/mapreduce-is-good-for-distributed.html' title='MapReduce is good for distributed processing, too'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6359765165633108562</id><published>2009-07-09T15:54:00.002+01:00</published><updated>2010-09-28T09:54:59.245+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Distributed File System'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><title type='text'>Unplanned side-effect of a decentralised distributed file system</title><content type='html'>I identified early on in the design of my distributed file system that to achieve my goal of being completely decentralised - i.e. there is no single server which co-ordinates the activities or stores critical data - the design of the directory service will be both critical and challenging.&lt;br /&gt;&lt;br /&gt;I have a logical design for the directory service and have begun an experimental implementation which is proving the concept is feasible. A very happy yet unplanned side-effect of my design is that every node in the system is resilient to the loss of its own catalog file. In fact, the catalog file is merely a local optimisation for system startup, and is completely expendible.&lt;br /&gt;&lt;br /&gt;Perhaps time will prove this is not the case, but I cannot think of any situation where the catalog file becomes critical in the system. The information within the file is critical, obviously, but that information is distributed and recreatable as well as being resilient with redundant distribution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6359765165633108562?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6359765165633108562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/unplanned-side-effect-of-decentralised.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6359765165633108562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6359765165633108562'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/unplanned-side-effect-of-decentralised.html' title='Unplanned side-effect of a decentralised distributed file system'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5996341387332053775</id><published>2009-07-03T20:38:00.000+01:00</published><updated>2009-09-10T14:05:59.214+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Distributed File System'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><title type='text'>Configuration-less distributed file system running on Windows</title><content type='html'>&lt;a href="http://craighenderson.co.uk/blog/wp-content/uploads/2009/07/dfs_win.png"&gt;&lt;img class="alignright size-thumbnail wp-image-28" title="DFS on Windows" src="http://craighenderson.co.uk/blog/wp-content/uploads/2009/07/dfs_win-150x150.png" alt="" width="150" height="150" /&gt;&lt;/a&gt;Here are some initial screenshots of my distributed file system running on Microsoft WIndows XP on two laptops. The system can be run without parameters in which case a default multicast address is used for notifications and a default port is used for the block distribution. Alternatively, these can be specified on the command line.&lt;br/&gt;&lt;br/&gt;This flexibility means that dfs runs "out of the box" on a network of machines without any confirguation, which is a primary design goal of the system. For any industrial-strength system, configuration is a must, of course, but should not be a barrier to getting it up and running for those curious to try it out.&lt;br/&gt;&lt;br/&gt;The green window is running a test environment on a remote laptop (machine name chenderson630). All other windows are running on my development box (compaq6715b), each with a different block distribution port configured. This was I can locally test the software without worrying about re-deploying the software to a number of different machines across the network.&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;Command-line interface. &lt;span style="font-weight: normal;"&gt;There is a simple command-line interface to the dfs. From here, I can &lt;em&gt;bounce&lt;/em&gt; the dfs instance without having to kill the process (see the pink window), and I can list the peer dfs instances that are currently 'connected' to a particular instance.&lt;/span&gt;&lt;/strong&gt;&lt;br/&gt;&lt;br/&gt;There's much more to do, including dusting off my Unbuntu box and building Linux variant. This should be easy as I've used standard, portable C++, and &lt;a title="Boost.org" href="http://www.boost.org" target="_blank"&gt;Boost&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5996341387332053775?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5996341387332053775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/configuration-less-distributed-file.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5996341387332053775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5996341387332053775'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/07/configuration-less-distributed-file.html' title='Configuration-less distributed file system running on Windows'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6940755424260566643</id><published>2009-06-19T20:05:00.000+01:00</published><updated>2009-09-10T14:05:59.163+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Distributed File System'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>Distributed File System for MapReduce</title><content type='html'>I am now designed the Distributed File System for use with my MapReduce library. The aim is to achieve a scalable and resilient DFS which is peerless and heterogenous across all platforms; 32/64 bit, Windows, Linux, etc.&lt;br/&gt;&lt;br/&gt;I have some ideas about how the peerless data block distribution is going to work, so there will be no central server and therefore no single point of failure. The directory service is a challenge - first idea is to treat the directory structure as 'just another file' in the filesystem and use the block distribution mechanism with a high replication value, but I'm not sure about that part...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6940755424260566643?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6940755424260566643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/distributed-file-system-for-mapreduce.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6940755424260566643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6940755424260566643'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/distributed-file-system-for-mapreduce.html' title='Distributed File System for MapReduce'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6203745366900449555</id><published>2009-06-18T17:42:00.000+01:00</published><updated>2009-09-10T14:05:59.154+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><title type='text'>Recursive MapReduce</title><content type='html'>A curious thought occurred to me. One of the biggest challenges in implementing the MapReduce framework is sorting large volumes of intermediate key/value pairs in temporary files. My first implementation uses the OS sort - using a call to C system() - which is clearly not a good solution, although functional. I've implemented a recursive divide and conquer merge-sort which works nicely, but what if...&lt;br/&gt;&lt;br/&gt;Could implementing a Sort function in MapReduce provide a Sort function &lt;em&gt;for &lt;/em&gt;MapReduce? A curiously-recurring-MapReduce-pattern, perhaps?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6203745366900449555?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6203745366900449555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/recursive-mapreduce.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6203745366900449555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6203745366900449555'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/recursive-mapreduce.html' title='Recursive MapReduce'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-9136936222634246322</id><published>2009-06-17T20:19:00.000+01:00</published><updated>2009-09-10T14:05:59.091+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Boost'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel programming'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><title type='text'>MapReduce in C++</title><content type='html'>&lt;a href="http://labs.google.com/papers/mapreduce.html" target="_blank"&gt;MapReduce&lt;/a&gt; is a programming model from Google that is designed for scalable data processing. Google's implementation is for scalability over many thousands of commodity machines, but there is value in using the idiom on multi-core processors to partition processing to efficiently use the CPU available, and avoid the complexities of multi-threaded development.&lt;br/&gt;&lt;br/&gt;I have a implemented a MapReduce runtime library in C++ for single machine applications using the &lt;a href="http://boost.org" target="_blank"&gt;Boost&lt;/a&gt; libraries. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the Google paper.&lt;br/&gt;&lt;br/&gt;&lt;code&gt;map (k1,v1) --&amp;gt; list(k2,v2)&lt;br/&gt;reduce (k2,list(v2)) --&amp;gt; list(v2)&lt;br/&gt;&lt;/code&gt;&lt;br/&gt;You can read about the library at &lt;a href="http://www.craighenderson.co.uk/mapreduce/"&gt;http://www.craighenderson.co.uk/mapreduce/&lt;/a&gt;. I have also started a discussion thread on the Boost developers list (&lt;a href="http://lists.boost.org/Archives/boost/2009/06/152961.php"&gt;http://lists.boost.org/Archives/boost/2009/06/152961.php&lt;/a&gt;) to see if there is interest in submitting the implementation to the Boost library.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-9136936222634246322?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/9136936222634246322/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/mapreduce-in-c.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/9136936222634246322'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/9136936222634246322'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2009/06/mapreduce-in-c.html' title='MapReduce in C++'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-7731125767644115381</id><published>2008-10-31T14:18:00.000Z</published><updated>2009-09-10T14:05:59.074+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='user interface'/><title type='text'>Screenshot attachments</title><content type='html'>A customer reported a bug this week and attached a screenshot to the Bugzilla entry to help us to understand the problem they were facing. The attachment was a MS Word document containing a single screenshot image. This got us thinking about the problem of creating a screenshot to upload to a web based system like Bugzilla, and concluded with a simple solution.&lt;br/&gt;&lt;br/&gt;In the standard File Open dialog in WIndows, wouldn’t it be great if the user could paste the clipboard resulting in a temporary file being created with the contents of the clipboard and the temporary file then being automatically selected and the Open dialog could close automatically.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-7731125767644115381?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/7731125767644115381/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/screenshot-attachments.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7731125767644115381'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/7731125767644115381'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/screenshot-attachments.html' title='Screenshot attachments'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-1029811480555993355</id><published>2008-10-31T14:16:00.000Z</published><updated>2009-09-10T14:05:59.047+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><category scheme='http://www.blogger.com/atom/ns#' term='crash'/><title type='text'>Unhandled Exceptions in WinDbg</title><content type='html'>A process crashes and WinDbg pops up as a JIT debugger, here's how to make some sort of sense of it all. In the WinDbg command window, type &lt;strong&gt;~*kv&lt;/strong&gt; and hit return. This will dump the top of the stack for all threads. Look down the list for an entry containing &lt;strong&gt;kernel32!UnhandledExceptionFilter&lt;/strong&gt;, like the one below.&lt;br/&gt;&lt;div style="border:solid 1px #cccccc; overflow:auto;"&gt;&lt;br/&gt;&lt;pre style="font-family:courier; font-size:x-small"&gt;ChildEBP RetAddr  Args to Child&lt;br/&gt;0012968c 77f7f49f 77e74bd8 00000002 001296d8 SharedUserData!SystemCallStub+0x4    (FPO: [0,0,0])&lt;br/&gt;00129690 77e74bd8 00000002 001296d8 00000001 ntdll!ZwWaitForMultipleObjects+0xc    (FPO: [5,0,0])&lt;br/&gt;0012972c 77e74c70 00000002 00129800 00000000 kernel32!WaitForMultipleObjectsEx+0x12c    (FPO: [Non-Fpo])&lt;br/&gt;00129744 69455a73 00000002 00129800 00000000 kernel32!WaitForMultipleObjects+0x17    (FPO: [4,0,0])&lt;br/&gt;00129ec8 69456ba3 0012b49c ffffffff 00008310 faultrep!StartDWException+0x575    (FPO: [Non-Fpo])&lt;br/&gt;0012af30 77eb9d66 0012b49c ffffffff c0000005 faultrep!ReportFault+0x488 (FPO:    [Non-Fpo])&lt;br/&gt;&lt;span style="color: #ff0000;"&gt;0012b454 77c313c8 &lt;/span&gt;&lt;strong&gt;&lt;span style="font-family:courier; font-size:x-small; color: #ff0000;"&gt;0012b49c&lt;/span&gt;&lt;/strong&gt;&lt;span style="color: #ff0000;"&gt; 00409e20 00000000 kernel32!UnhandledExceptionFilter+0x2e2    (FPO: [Non-Fpo])&lt;br/&gt;&lt;/span&gt;0012b470 00404f4d 00000000 0012b49c 77c33efb msvcrt!_XcptFilter+0x15f (FPO:    [Non-Fpo])&lt;br/&gt;0012ffc0 77e7eb69 0012ef40 00000001 7ffdf000 WgnDV!WinMainCRTStartup+0x14f&lt;br/&gt;0012fff0 00000000 00404dfe 00000000 78746341 kernel32!BaseProcessStart+0x23    (FPO: [Non-Fpo])&lt;/pre&gt;&lt;br/&gt;&lt;/div&gt;&lt;br/&gt;Take first parameter of the unhandled exception handle (in &lt;strong&gt;&lt;span style="color: #ff0000;"&gt;red&lt;/span&gt;&lt;/strong&gt;&lt;span style="color: #ff0000;"&gt; &lt;/span&gt;above), and type the following command in the command window&lt;br/&gt;&lt;br/&gt;0:009&amp;gt; dd 0012b49c&lt;br/&gt;&lt;br/&gt;and the output should read something along the lines of&lt;br/&gt;&lt;pre style="font-family:courier; font-size:small"&gt;0012b49c  &lt;span style="color: #0000ff;"&gt;0012b58c &lt;/span&gt;&lt;span style="color: #ff00ff;"&gt;0012b5a8 &lt;/span&gt;0012b4c8 77f833a0&lt;br/&gt;0012b4ac  0012b58c 0012ffb0 0012b5a8 0012b564&lt;br/&gt;0012b4bc  0012fdcc 77f833b4 0012ffb0 0012b574&lt;br/&gt;0012b4cc  77f83372 0012b58c 0012ffb0 0012b5a8&lt;br/&gt;0012b4dc  0012b564 00404fb4 00000001 0012b58c&lt;br/&gt;0012b4ec  0012ffb0 77f617ee 0012b58c 0012ffb0&lt;br/&gt;0012b4fc  0012b5a8 0012b564 00404fb4 00000000&lt;br/&gt;0012b50c  0012b58c 000006d2 74939311 00000000&lt;/pre&gt;&lt;br/&gt;The first column is the address being dumped, and the values start in column 2. The first hex value, in &lt;span style="color: #0000ff;"&gt;blue&lt;/span&gt;, is a pointer to the address of the Exception Record, and the second in &lt;span style="color: #ff00ff;"&gt;pink &lt;/span&gt;is the Context Record address.&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;Exception Record&lt;/strong&gt;&lt;br/&gt;&lt;div style="border:solid 1px #cccccc; overflow:auto;"&gt;&lt;br/&gt;&lt;pre style="font-family:courier; font-size:small"&gt;0:009&amp;gt; .exr &lt;span style="color: #0000ff;"&gt;0012b58c&lt;/span&gt;&lt;br/&gt;ExceptionAddress: 7492adb4 (mshtml!CLSRenderer::BlastLineToScreen+0x00000143)&lt;br/&gt;ExceptionCode: c0000005 (Access violation)&lt;br/&gt;ExceptionFlags: 00000000&lt;br/&gt;NumberParameters: 2&lt;br/&gt;   Parameter[0]: 00000000&lt;br/&gt;   Parameter[1]: 00000000&lt;br/&gt;Attempt to read from address 00000000&lt;/pre&gt;&lt;br/&gt;&lt;/div&gt;&lt;br/&gt;&lt;strong&gt;Context Record&lt;/strong&gt;&lt;br/&gt;&lt;pre style="font-family:courier;"&gt;0:009&amp;gt; .cxr &lt;span style="color: #ff00ff;"&gt;0012b5a8&lt;/span&gt;&lt;br/&gt;eax=001dc81e ebx=0012baec ecx=00000000 edx=00000072 esi=000006d2 edi=00000000&lt;br/&gt;eip=7492adb4 esp=0012b874 ebp=0012b930 iopl=0         nv up ei pl nz na pe nc&lt;br/&gt;cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202&lt;br/&gt;mshtml!CLSRenderer::BlastLineToScreen+0x143:&lt;br/&gt;7492adb4 8b01             mov     eax,[ecx]         ds:0023:00000000=????????&lt;/pre&gt;&lt;br/&gt;The .cxr command displays the context record, and also sets the register context. So following the command, the thread context is set to the location of the crash, in this case an Access Violation. A large stack dump will show the call stack leading up to the crash:&lt;br/&gt;&lt;div style="border:solid 1px #cccccc; overflow:auto;"&gt;&lt;br/&gt;&lt;pre style="font-family:courier; font-size:x-small"&gt;0:009&amp;gt; kv 100&lt;br/&gt;ChildEBP RetAddr  Args to Child&lt;br/&gt;0012b9a4 7492a14c 0012bd24 0000026e 01318f28 mshtml!CLSRenderer::BlastLineToScreen+0x143 (FPO: [1,73,3])&lt;br/&gt;0012ba18 7492a026 0012bd24 00000000 0012f9c4 mshtml!CLSRenderer::RenderLine+0x3c7 (FPO: [Non-Fpo])&lt;br/&gt;0012bdcc 74929d7c 0000026e 0012fa1c 00000000 mshtml!CDisplay::Render+0x3b1 (FPO: [4,230,3])&lt;br/&gt;0012bde0 7492959d 0012f9c4 012e5370 012fad30 mshtml!CFlowLayout::Draw+0x19 (FPO: [2,0,0])&lt;br/&gt;0012be14 749296ff 0012c170 0012c1b4 012d1f00 mshtml!CLayout::DrawClient+0x72 (FPO: [Non-Fpo])&lt;br/&gt;0012c1e8 7492702c 012fad30 00000000 fffffffc mshtml!CDispLeafNode::DrawSelf+0x37a (FPO: [3,233,3])&lt;br/&gt;0012c3c8 74929129 012fad30 00000000 00000007 mshtml!CDispNode::Draw+0xe9 (FPO: [3,112,3])&lt;br/&gt;0012c3e4 74928d92 012fad30 0012c6fc 00000000 mshtml!CDispContainer::DrawChildren+0x31 (FPO: [2,0,3])&lt;br/&gt;0012c6f0 7492702c 012fad30 012e5370 fffffffc mshtml!CDispContainer::DrawSelf+0x204 (FPO: [3,188,3])&lt;br/&gt;0012c8d0 74929129 012fad30 00000000 00000007 mshtml!CDispNode::Draw+0xe9 (FPO: [3,112,3])&lt;br/&gt;0012c8ec 74928d92 012fad30 0012cc04 00000000 mshtml!CDispContainer::DrawChildren+0x31 (FPO: [2,0,3])&lt;br/&gt;0012cbf8 7492702c 012fad30 012e5304 fffffffc mshtml!CDispContainer::DrawSelf+0x204 (FPO: [3,188,3])&lt;br/&gt;0012cdd8 748f1928 012fad30 012e5304 00000007 mshtml!CDispNode::Draw+0xe9 (FPO: [3,112,3])&lt;br/&gt;0012de04 748f181d 012fad30 0012e16c 00000000 mshtml!CDispRoot::DrawBand+0xc9 (FPO: [2,1027,3])&lt;br/&gt;0012e180 74926f4b 0131b6f0 012d1f00 012fad30 mshtml!CDispRoot::DrawBands+0xe1 (FPO: [Non-Fpo])&lt;br/&gt;0012f534 74927684 0131b6f0 012d1f00 012fad30 mshtml!CDispRoot::DrawRoot+0x27e (FPO: [6,1250,3])&lt;br/&gt;0012f654 74927507 0012f9c4 aa0407f1 00000000 mshtml!CView::RenderView+0x1c6 (FPO: [4,61,3])&lt;br/&gt;0012fae0 7493820a 00000000 0018a7a0 0000000f mshtml!CDoc::OnPaint+0x3fa (FPO: [0,282,3])&lt;br/&gt;0012fb00 7493811a 0018a7a0 0000000f 00000000 mshtml!CServer::OnWindowMessage+0x36e (FPO: [Non-Fpo])&lt;br/&gt;0012fc18 74938006 00000000 0000000f 00000000 mshtml!CDoc::OnWindowMessage+0xbd4 (FPO: [5,60,3])&lt;br/&gt;0012fd48 77d43a5f 0018013c 0000000f 00000000 mshtml!CServer::WndProc+0x86 (FPO: [Non-Fpo])&lt;br/&gt;0012fd74 77d43b2e 74937fac 0018013c 0000000f user32!InternalCallWinProc+0x1b&lt;br/&gt;0012fddc 77d45874 00000000 74937fac 0018013c user32!UserCallWinProcCheckWow+0x150 (FPO: [Non-Fpo])&lt;br/&gt;0012fe30 77d458a4 00631d88 0000000f 00000000 user32!DispatchClientMessage+0xa3 (FPO: [Non-Fpo])&lt;br/&gt;0012fe58 77f5108f 0012fe68 00000018 00631d88 user32!__fnDWORD+0x22 (FPO: [Non-Fpo])&lt;br/&gt;0012fe7c 77d45927 77d4593a 0040d7dc 0040d7dc ntdll!KiUserCallbackDispatcher+0x13 (FPO: [0,0,0])&lt;br/&gt;0012fec4 77d441fd 0040d7dc 00000001 73dd11ce user32!NtUserDispatchMessage+0xc (FPO: [Non-Fpo])&lt;br/&gt;0012fed0 73dd11ce 0040d7dc 00000001 0040d7a8 user32!DispatchMessageA+0xb (FPO: [1,0,0])&lt;br/&gt;0012fee0 73dd91a4 0040d7a8 0040d7a8 0012ffc0 mfc42!CWinThread::PumpMessage+0x3a (FPO: [0,0,2])&lt;br/&gt;0012fef8 73dd9154 0040d7a8 73ddb4fe 0012ef40 mfc42!CWinThread::Run+0x47 (FPO: [EBP 0x0012ffc0] [0,1,4])&lt;br/&gt;0012ff00 73ddb4fe 0012ef40 001423ec 00000000 mfc42!CWinApp::Run+0x20 (FPO: [0,0,1])&lt;br/&gt;0012ff10 00407f56 00400000 00000000 001423ec mfc42!AfxWinMain+0x67 (FPO: [4,0,3])&lt;br/&gt;0012ff24 00404f32 00400000 00000000 001423ec WgnDV!WinMain+0x15 (FPO: [4,0,0]) (CONV: stdcall) [appmodul.cpp @ 30]&lt;br/&gt;0012ffc0 77e7eb69 0012ef40 00000001 7ffdf000 WgnDV!WinMainCRTStartup+0x134&lt;br/&gt;0012fff0 00000000 00404dfe 00000000 78746341 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo])&lt;/pre&gt;&lt;br/&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-1029811480555993355?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/1029811480555993355/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/unhandled-exceptions-in-windbg.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/1029811480555993355'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/1029811480555993355'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/unhandled-exceptions-in-windbg.html' title='Unhandled Exceptions in WinDbg'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-5124198681849617252</id><published>2008-10-31T14:13:00.000Z</published><updated>2009-09-10T14:05:58.980+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='defect management'/><category scheme='http://www.blogger.com/atom/ns#' term='software development'/><category scheme='http://www.blogger.com/atom/ns#' term='articles'/><title type='text'>ACM Author Homepage</title><content type='html'>Following the publication of my article &lt;strong&gt;Managing software defects: defect analysis and traceability&lt;/strong&gt; in &lt;em&gt;ACM SIGSOFT Software Engineering Notes&lt;/em&gt;, the ACM have created me my very own Author Homepage &lt;a href="http://portal.acm.org/author_page.cfm?id=81363594884" target="_blank"&gt;here&lt;/a&gt;, complete with a counter of downloads and other Bibliometrics. If you are not a member and would like to read the article, contact me and I'll sort something out.&lt;br/&gt;&lt;br/&gt;Here's the abstract:&lt;br/&gt;&lt;blockquote&gt;&lt;em&gt;This paper describes a mechanism for presenting software defect metrics to aid analysis. A graphical representation of the history of software builds is presented, that records software build quality in a way that cannot be displayed in a single numerical table, and is visually more appealing and more easily digestible than a series of related tables. The radial analysis charts can be used to represent derivative information in a two-dimensional form and is demonstrated with practical examples of Defect Analysis and Root Cause Analysis.&lt;/em&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-5124198681849617252?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/5124198681849617252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/acm-author-homepage.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5124198681849617252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/5124198681849617252'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/acm-author-homepage.html' title='ACM Author Homepage'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4377882162568277664.post-6387243123062071764</id><published>2008-10-31T13:57:00.000Z</published><updated>2009-09-10T14:05:58.646+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='personal'/><category scheme='http://www.blogger.com/atom/ns#' term='websites'/><title type='text'>A new blog</title><content type='html'>I got my first internet connection at home in 1995, and used the internet even before that. For all of this time I have had a personal website in one form or another, but I have never before had a Blog. Until now. Rather than replace my site with a Blog, I have added the blog to my existing site. I will continue to update and maintain the main site and this blog will be a dynamic addition.&lt;br/&gt;&lt;br/&gt;Here's a scattered history of my web presence:&lt;br/&gt;1997 KC-Imaging; still available at the Internet Archive Wayback Machine &lt;a href="http://web.archive.org/web/19980702053215/http://www.kc-imaging.demon.co.uk/"&gt;here&lt;/a&gt;.&lt;br/&gt;2001 &lt;a href="http://freespace.virgin.net/cdm.henderson/"&gt;http://freespace.virgin.net/cdm.henderson/&lt;/a&gt;&lt;br/&gt;2007 MyMortgagePlan.co.uk (retired)&lt;br/&gt;2008 I finally register my own &lt;a href="http://www.craighenderson.co.uk/"&gt;personal domain name&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4377882162568277664-6387243123062071764?l=craig-henderson.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://craig-henderson.blogspot.com/feeds/6387243123062071764/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/new-blog.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6387243123062071764'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4377882162568277664/posts/default/6387243123062071764'/><link rel='alternate' type='text/html' href='http://craig-henderson.blogspot.com/2008/10/new-blog.html' title='A new blog'/><author><name>Craig Henderson</name><uri>http://www.blogger.com/profile/17037181466246579343</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='30' src='http://4.bp.blogspot.com/_8UN9ALNciHU/S93B02nxiPI/AAAAAAAAAG8/tGUxJAvcCMI/s1600-R/port.jpg'/></author><thr:total>11</thr:total></entry></feed>
