Programmers just need to define map and reduce functions, and the MR will manage, and hide all aspects of distribution
Input1 -> Map -> a,1 b,1
Input2 -> Map -> b,1
Input3 -> Map -> a,1 c,1
| | |
| | -> Reduce -> c,1
| -----> Reduce -> b,2
---------> Reduce -> a,2
Map()
for each input file, produces set of k2,v2
<k2,v3>
pairs from Reduce()s