hadoop - Get Top N items from mapper output - Mapreduce -


my mapper task returns me following output:

2   c 2   g 3   3   b 6   r 

i have written reducer code , keycomparator produces correct output how top 3 out (top n count) of mapper output:

public static class wlreducer2 extends         reducer<intwritable, text, text, intwritable> {      @override     protected void reduce(intwritable key, iterable<text> values,             context context) throws ioexception, interruptedexception {          (text x : values) {             context.write(new text(x), key);         }      };  }  public static class keycomparator extends writablecomparator {     protected keycomparator() {         super(intwritable.class, true);     }      @override     public int compare(writablecomparable w1, writablecomparable w2) {         // todo auto-generated method stub          // logger.error("--------------------------> writing keycompare data = ----------->");         intwritable ip1 = (intwritable) w1;         intwritable ip2 = (intwritable) w2;         int cmp = -1 * ip1.compareto(ip2);          return cmp;     } } 

this reducer output:

r   6 b   3   3 g   2 c   2 

the expected output reducer top 3 count is:

r   6 b   3   3 

restrict output reducer. this.

public static class wlreducer2 extends         reducer<intwritable, text, text, intwritable> {     int count=0;     @override     protected void reduce(intwritable key, iterable<text> values,             context context) throws ioexception, interruptedexception {          (text x : values) {             if (count > 3)             context.write(new text(x), key);             count++;         }      }; } 

set number of reducers 1. job.setnumreducetasks(1).


Comments