Dashboards & Visualizations

Why am I getting a "Failed to start MapReduce job" error running YARN and hunk 6.2?

p_gurav
Champion

Hi,

I run search as index=testvi and all works fine, but when I add any condition, I am getting "Failed to start MapReduce job" error. I looked around search.log which is as follows:

04-23-2015 11:03:01.029 ERROR ERP.testprovider -  com.splunk.mr.JobStartException: Failed to start MapReduce job.  Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_200-01-06.sc1.verticloud.com_1429786977.431_0 ] and [ file /tmp/hadoop-yarn/staging/hunk/.staging/job_1428964782359_0317/job.jar.
04-23-2015 11:03:01.029 ERROR ERP.testprovider -  Requested replication 10 exceeds maximum 4
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.verifyReplication(BlockManager.java:939)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplicationInt(FSNamesystem.java:2101)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setReplication(FSNamesystem.java:2092)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setReplication(NameNodeRpcServer.java:551)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setReplication(ClientNamenodeProtocolServerSideTranslatorPB.java:388)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at java.security.AccessController.doPrivileged(Native Method)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at javax.security.auth.Subject.doAs(Subject.java:415)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -   ]
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:659)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJobForAvailableSplits(JobSubmitter.java:712)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.finished(JobSubmitter.java:702)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.executeMapReduce(SplunkMR.java:1197)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:928)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:771)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1518)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.run(SplunkMR.java:1300)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.SplunkMR.main(SplunkMR.java:1546)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -  Caused by: java.lang.RuntimeException: Failed to start MapReduce job, name=SPLK_200-01-06.sc1.verticloud.com_1429786977.431_0
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJobImpl(JobSubmitter.java:630)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    at com.splunk.mr.JobSubmitter.startJob(JobSubmitter.java:655)
04-23-2015 11:03:01.029 ERROR ERP.testprovider -    ... 10 more
04-23-2015 11:03:01.352 WARN  SearchOperator:kv - buildRegexList provided empty conf key, ignoring.
04-23-2015 11:03:01.352 INFO  ERPSearchResultCollector - ERP peer=testprovider is done reading search results.

Following are my provider settings:

[provider:testprovider]
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /opt/hadoop
vix.env.JAVA_HOME = /usr/java/default
vix.family = hadoop
vix.fs.default.name = hdfs://XXXXX:8020
vix.mapreduce.framework.name = yarn
vix.output.buckets.max.network.bandwidth = 0
vix.splunk.home.hdfs = /user/hunk/
vix.splunk.setup.package = /opt/hunk-6.2.2-257696-Linux-x86_64.tar
vix.yarn.resourcemanager.address =XXXXXX:8032
vix.yarn.resourcemanager.scheduler.address = XXXXXX:8030

So can any body please help me with this? Thanks in advance.

1 Solution

rdagan_splunk
Splunk Employee
Splunk Employee

The error - Requested replication 10 exceeds maximum 4 - seems to stop you from running a MR jobs.

The value of 10 most likely comes from mapred-site.xml flag called mapreduce.client.submit.file.replication
And I suspect that the limit of 4 comes from the file hdfs-site.xml flag called dfs.replication.max (default to 512)

View solution in original post

rdagan_splunk
Splunk Employee
Splunk Employee

The error - Requested replication 10 exceeds maximum 4 - seems to stop you from running a MR jobs.

The value of 10 most likely comes from mapred-site.xml flag called mapreduce.client.submit.file.replication
And I suspect that the limit of 4 comes from the file hdfs-site.xml flag called dfs.replication.max (default to 512)

jmallorquin
Builder

Hi,

I have a issue quite similiar. If i query for a virtual index, the search head start to show events but stop when have mached 36M.
If i count the events of the two file logs that we are query the total lines are 66M.

When the query finish the error shows this message:

[pruebas] Exception - com.splunk.mr.JobStartException: Failed to start MapReduce job. Please consult search.log for more information. Message: [ Failed to start MapReduce job, name=SPLK_Cloudera04_1480957020.4_0 ] and [ Call From Cloudera04/172.20.1.235 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused ] 

Any advice?

Thanks in advance.

0 Karma

p_gurav
Champion

Thank you for your help. @rdagan_splunk. 🙂 After overriding "mapreduce.client.submit.file.replication" problem got resolved.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...