cluster file system (HDFS)
single name node
maps names to blocks
maps blocks to data nodes
single point of failure (metadata backed up)
multiple data nodes
store blocks (size configurable per file)
potentially replicated (count configurable per file)
replica placement balances rack and node utilization
interface similar to standard file system
files and directories
writing data only once
mapping to C, Java, REST
cluster map reduce implementation
single JobTracker node
schedules tasks to nodes
retrying failed tasks
multiple TaskTracker nodes