Mapreduce error when running Analytics Pipeline in single node Hadoop cluster in Openstack

eric.herrera · February 19, 2021, 9:03pm

I’ve been trying to run the analytics pipeline in single node Hadoop cluster created in an OpenStack Instance but I always get the same error:

INFO mapreduce.Job: Job job_1612970692718_0016 failed with state KILLED due to: REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. reduceResourceRequest: <memory:5120, vCores:1> maxContainerCapability:<memory:2048, vCores:32>

After doing some research in the internet I found that it might be related with the adjustment of the settings described here: configuration/main.yml at open-release/juniper.master · edx/configuration · GitHub

I set those variables and was able to see the changes in the Hadoop configuration files (/edx/app/hadoop/hadoop/etc/hadoop/yarn-site.xml and /edx/app/hadoop/hadoop/etc/hadoop/mapred-site.xml). Additionally I connected to the Haddoop Web UI exposed in the port 8088 and under the conf tab I was able to see the right values (I rebooted the server to restart all the hadoop services). However after trying several different combinations of settings I was always getting the same error. The reduceResourceRequest and maxContainerCapability were always the same. It made me think that the hadoop settings was being overridden or just ignored but I could not figure it out. I even tried with a bigger instance size (more RAM and VCPU) but the error always remained the same.

Some other important details are:

Analytics Pipeline Version: open-release/juniper.master
Hadoop single node cluster instance size: t1.xlarge (16 GB RAM, 4 VCPU) or t1.2xlarge (32 GB RAM, 8 VCPU), Any of those worked for me.
Configuration files: edx-analytics-pipeline-hadoop-issue · GitHub

I am not a Hadoop expert and I don’t understand why I keep getting the same error which seems to be related to Hadoop mapred client hadoop/RMContainerAllocator.java at 1e3a6efcef2924a7966c44ca63476c853956691d · apache/hadoop · GitHub. I hope someone here has experienced similar issues and can help me to fix them. Thanks in advance.

Topic		Replies	Views
Analytic devstack installation Site Operations Help	0	459	March 13, 2020
Error When Running Remote Task For Insights (Ubuntu 16.04) Site Operations Help analytics	2	688	October 10, 2021
Run task AnswerDistributionToMySQLTask failure in analytics-pipeline Site Operators analytics	3	665	September 20, 2021
edX analytics pipeline fails Site Operations Help analytics	5	1712	September 19, 2021
Hadoop - waiting for AM container to be allocated, launched and register with RM Site Operators	1	979	June 6, 2020

Mapreduce error when running Analytics Pipeline in single node Hadoop cluster in Openstack

Related topics