You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hadoop/README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,11 @@ If a word occurs multiple times in a line, one token is emitted for each occurre
20
20
21
21
<Text (the word), Iterable<WriteableInteger> (number of occurrences)>
22
22
23
-
The reducer also acts as combinator, meaning that a reduction step is also performed locally before the results of each mapper is sent to the central reduction step. It will add up all the occurrences to a single number and thus emit tuples of the form
23
+
Hadoop as put all the `WriteableInteger` generated by the mapping step which belong to the same `Text (the word)` key into an `Iterable` list for us. Thus, for each word that the mapper has discovered, we get a list with numbers. All we have to do is to add them up and emit tuples of the form:
24
24
25
25
<Text (the word), WriteableInteger (total number of occurrences)>
26
+
27
+
The reducer here also acts as combinator, meaning that a reduction step is also performed locally before the results of each mapper is sent to the central reduction step. This way we can already add up some word counts locally and the amount of data that needs to be sent to the central reducer decreases, as two tuples for the same word are already merged. This is possible in this simple form because the output of the reducer is the same as the output of the mapper, just that the `WriteableInteger` part will not necessarily have value `1` afterwards.
26
28
27
29
After the reduction step, we therefore know how often each word occurred in the text. Furthermore, since the tuples are sorted automatically before reduction, the word/occurrences list is also nicely sorted alphabetically.
0 commit comments