| +34 608 61 64 10



HBase/Java error reduce flushing

During the excuting a map and reduce of Hadoop Java job the reduce felt down, returning this error_get_last

Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#4

As mentioned here the error is a bug and the workaround was to decrease the value of "mapreduce.reduce.shuffle.input.buffer.percent". As we can see in the oficial documentation this percent value is 0.70 and we decided to decrease to 0.50
To set this value into the java reduce code, I modified adding this configuration set

conf.set("mapred.job.shuffle.input.buffer.percent", "0.50");

Note: this is the console output of MapReduce. Here some explication about these data

File System Counters
                FILE: Number of bytes read=486390889222
                FILE: Number of bytes written=687468686946
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=198283602026 -----> 184 GB 
                HDFS: Number of bytes written=207799910639 --> 193 GB
                HDFS: Number of read operations=4488
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=41
        Job Counters
                Killed map tasks=3
                Killed reduce tasks=1
                Launched map tasks=1479
                Launched reduce tasks=21
                Data-local map tasks=1476
                Rack-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=83317707
                Total time spent by all reduces in occupied slots (ms)=104222174
                Total time spent by all map tasks (ms)=83317707
                Total time spent by all reduce tasks (ms)=52111087
                Total vcore-seconds taken by all map tasks=83317707
                Total vcore-seconds taken by all reduce tasks=52111087
                Total megabyte-seconds taken by all map tasks=42658665984
                Total megabyte-seconds taken by all reduce tasks=53361753088
        Map-Reduce Framework
                Map input records=142741637
                Map output records=142741637
                Map output bytes=200040264798
                Map output materialized bytes=200896701993 
                Input split bytes=202212
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=200896701993 --> 187 GB
                Reduce input records=142741637
                Reduce output records=142741637
                Spilled Records=419903970
                Shuffled Maps =29520
                Failed Shuffles=0
                Merged Map outputs=29520
                GC time elapsed (ms)=7061006
                CPU time spent (ms)=55573260
                Physical memory (bytes) snapshot=556034654208
                Virtual memory (bytes) snapshot=3712997023744
                Total committed heap usage (bytes)=512887357440 --> 477 GB
        Shuffle Errors
        File Input Format Counters
                Bytes Read=198283399814
        File Output Format Counters
                Bytes Written=207799909871