All Apps and Add-ons

What block replication factor does Hunk use in its MR jobs? (under replicated blocks)

alexmc
Explorer

I am seeing lots of under replicated blocks in my Hadoop cluster. It's main client is Hunk. (HDP 2.2 and Hunk 6.6.1 I think)
When I do a hdfs fsck / I see that the blocks in question look like they were created by hunk.... eg

/user/splunk/.staging/job_1424956467914_0015/job.jar: Under replicated BP-1255772799-10.34.37.1-1421676659908:blk_1073768329_27512. Target Replicas is 10 but found 5 replica(s).

(I used user splunk before switching on user proxying)

Now what I would like to know is what is setting the target replica count to 10? I want to remove that or manually change it down to 3. I only have 5 data nodes so 10 copies of a block will be impossible.

I can't see anything suitable in the indexes.conf

Tags (1)
0 Karma
1 Solution

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

View solution in original post

lloydd518
Path Finder

Hunk does use the replication factor set in the configurations of hadoop client on the search head, however.... if you haven't set this to match the replication number of your hadoop cluster... your hadoop cluster will grumble about under replicated blocks caused by your Hunk searches because your Hunk searches will probably be requesting an arbritrary number like '10'.

A way to override this and resolve the issue or prevent it in the future is... to add the following line to your virtual index provider settings.

vix.mapreduce.client.submit.file.replication = 1 (or what ever your hadoop cluster replication number is)

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

rdagan_splunk
Splunk Employee
Splunk Employee

Exactly, Hunk is just using the default unless you override these flags in the provider. For example, VIX.yarn.resourcemanager.classpath

0 Karma

alexmc
Explorer

They weren't the issue - but it was good to check, thanks.

Further investigation showed that the problem was NOT limited to hunk created jobs - it seemed to be on some other MapReduce jobs unrelated to Hunk. This is not a Hunk problem.

So - to answer my own question - it uses the default!

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...