Brown Bag Seminar
Date: February 24, 2010
Time: 12 PM
Location: ITE 325
Speaker: David Chapman
Title: Map Reduce for Scientific Applications on Hybrid Multicore Clusters
Map Reduce is a programming paradigm popularized by Google for
very large data set computations. Current infrastructure, such as Apache
Hadoop,offers an open source practical solution for many data mining and
text analysis problems, but leaves much performance to be desired for
the scientific domain as a whole. Science presents several
fundamentally new challenges for the paradigm. We are discovering, developing
and testing novel approaches to improve performance for the scientific domain.
Although our technology is novel, our strategy is straightforward. We will
demonstrate scientific Map Reduce via three milestones.
First, we will enumerate differences between scientific computing and data
mining that can hinder Map Reduce performance. Second, we must develop
techniques and heuristics to take advantage and overcome these
differences. Finally, we will experimentally show how these techniques
can improve performance on actual science problems.
The first milestone is already accomplished as we have identified
multidimensional spatial locality, regular data distribution, relatively
modest data scale, and increased computation to be overarching
characteristics that typically distinguish scientific data
processing from text retrieval. We have made strides toward second and third
milestones by identifying message passing, and sorting to be
critical design decisions in generalizing Map Reduce for scientific
applications. We have developed a novel Map Reduce framework for clusters of
Cell B.E. multicore processors, that features a filesystem like variation on
linear time sorting, and have tested this framework on the remote sensing
geolocation and gridding problem.
We have yet to test multidimensional keys, and data pre-distribution
strategies, but have justification as to why we expect these new
techniques to be successful in improving performance even further.
All of our milestones will be accomplished in the event these
techniques are experimentally successful in a scientific multicore cloud
environment. We believe the culmination of these milestones will make Map
Reduce an attractive tool for the general scientific data management
community, and will broaden the set of scientific problems for which it is
Back to List...