All Apps and Add-ons

What block replication factor does Hunk use in its MR jobs? (under replicated blocks)

alexmc
Explorer

I am seeing lots of under replicated blocks in my Hadoop cluster. It's main client is Hunk. (HDP 2.2 and Hunk 6.6.1 I think)
When I do a hdfs fsck / I see that the blocks in question look like they were created by hunk.... eg

/user/splunk/.staging/job_1424956467914_0015/job.jar: Under replicated BP-1255772799-10.34.37.1-1421676659908:blk_1073768329_27512. Target Replicas is 10 but found 5 replica(s).

(I used user splunk before switching on user proxying)

Now what I would like to know is what is setting the target replica count to 10? I want to remove that or manually change it down to 3. I only have 5 data nodes so 10 copies of a block will be impossible.

I can't see anything suitable in the indexes.conf

Tags (1)
0 Karma
1 Solution

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

View solution in original post

lloydd518
Path Finder

Hunk does use the replication factor set in the configurations of hadoop client on the search head, however.... if you haven't set this to match the replication number of your hadoop cluster... your hadoop cluster will grumble about under replicated blocks caused by your Hunk searches because your Hunk searches will probably be requesting an arbritrary number like '10'.

A way to override this and resolve the issue or prevent it in the future is... to add the following line to your virtual index provider settings.

vix.mapreduce.client.submit.file.replication = 1 (or what ever your hadoop cluster replication number is)

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

rdagan_splunk
Splunk Employee
Splunk Employee

Exactly, Hunk is just using the default unless you override these flags in the provider. For example, VIX.yarn.resourcemanager.classpath

0 Karma

alexmc
Explorer

They weren't the issue - but it was good to check, thanks.

Further investigation showed that the problem was NOT limited to hunk created jobs - it seemed to be on some other MapReduce jobs unrelated to Hunk. This is not a Hunk problem.

So - to answer my own question - it uses the default!

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...