Dashboards & Visualizations

Why am I getting a "Failed to start MapReduce job" error running YARN and hunk 6.2?

p_gurav
Champion

Hi,

I run search as index=testvi and all works fine, but when I add any condition, I am getting "Failed to start MapReduce job" error. I looked around search.log which is as follows:

04-23-2015 11:03:01.029 ERROR ERP.testprovider -  com.splunk.mr.JobStartException: Failed to start MapReduce job.  Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_200-01-06.sc1.verticloud.com_1429786977.431_0 ] and [ file /tmp/hadoop-yarn/staging/hunk/.staging/job_1428964782359_0317/job.jar.
04-23-2015 11:03:01.029 ERROR ERP.testprovider -  Requested replication 10 exceeds maximum 4
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:939)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2101)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2092)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:551)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:388)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at java.security.AccessController.doPrivileged(Native Method)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at javax.security.auth.Subject.doAs(Subject.java:415)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -   ]
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:659)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJobForAvailableSplits(JobSubmitter.java:712)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.finished(JobSubmitter.java:702)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1197)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:928)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:771)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1518)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.run(SplunkMR.java:1300)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.main(SplunkMR.java:1546)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -  Caused by: java.lang.RuntimeException: Failed to start MapReduce job, name=SPLK_200-01-06.sc1.verticloud.com_1429786977.431_0
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJobImpl(JobSubmitter.java:630)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:655)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    ... 10 more
04-23-2015 11:03:01.352 WARN  SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
04-23-2015 11:03:01.352 INFO  ERPSearchResultCollector - ERP peer=testprovider is done reading search results.

Following are my provider settings:

[provider:testprovider]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.JAVA_HOME = /usr/java/default
vix.family = hadoop
vix.fs.default.name = hdfs://XXXXX:8020
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/hunk/
vix.splunk.setup.package = /opt/hunk-6.2.2-257696-Linux-x86_64.tar
vix.yarn.resourcemanager.address =XXXXXX:8032
vix.yarn.resourcemanager.scheduler.address = XXXXXX:8030

So can any body please help me with this? Thanks in advance.

1 Solution

rdagan_splunk
Splunk Employee
Splunk Employee

The error - Requested replication 10 exceeds maximum 4 - seems to stop you from running a MR jobs.

The value of 10 most likely comes from mapred-site.xml flag called mapreduce.client.submit.file.replication
And I suspect that the limit of 4 comes from the file hdfs-site.xml flag called dfs.replication.max (default to 512)

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

The error - Requested replication 10 exceeds maximum 4 - seems to stop you from running a MR jobs.

The value of 10 most likely comes from mapred-site.xml flag called mapreduce.client.submit.file.replication
And I suspect that the limit of 4 comes from the file hdfs-site.xml flag called dfs.replication.max (default to 512)

jmallorquin
Builder

Hi,

I have a issue quite similiar. If i query for a virtual index, the search head start to show events but stop when have mached 36M.
If i count the events of the two file logs that we are query the total lines are 66M.

When the query finish the error shows this message:

[pruebas] Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_Cloudera04_1480957020.4_0 ] and [ Call From Cloudera04/172.20.1.235 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused ] 

Any advice?

Thanks in advance.

0 Karma

p_gurav
Champion

Thank you for your help. @rdagan_splunk. 🙂 After overriding "mapreduce.client.submit.file.replication" problem got resolved.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...