All Apps and Add-ons

What block replication factor does Hunk use in its MR jobs? (under replicated blocks)

alexmc
Explorer

I am seeing lots of under replicated blocks in my Hadoop cluster. It's main client is Hunk. (HDP 2.2 and Hunk 6.6.1 I think)
When I do a hdfs fsck / I see that the blocks in question look like they were created by hunk.... eg

/user/splunk/.staging/job_1424956467914_0015/job.jar: Under replicated BP-1255772799-10.34.37.1-1421676659908:blk_1073768329_27512. Target Replicas is 10 but found 5 replica(s).

(I used user splunk before switching on user proxying)

Now what I would like to know is what is setting the target replica count to 10? I want to remove that or manually change it down to 3. I only have 5 data nodes so 10 copies of a block will be impossible.

I can't see anything suitable in the indexes.conf

Tags (1)
0 Karma
1 Solution

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

View solution in original post

lloydd518
Path Finder

Hunk does use the replication factor set in the configurations of hadoop client on the search head, however.... if you haven't set this to match the replication number of your hadoop cluster... your hadoop cluster will grumble about under replicated blocks caused by your Hunk searches because your Hunk searches will probably be requesting an arbritrary number like '10'.

A way to override this and resolve the issue or prevent it in the future is... to add the following line to your virtual index provider settings.

vix.mapreduce.client.submit.file.replication = 1 (or what ever your hadoop cluster replication number is)

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

Hunk uses the replication factor set in the configurations of hadoop client on the search head.

rdagan_splunk
Splunk Employee
Splunk Employee

Exactly, Hunk is just using the default unless you override these flags in the provider. For example, VIX.yarn.resourcemanager.classpath

0 Karma

alexmc
Explorer

They weren't the issue - but it was good to check, thanks.

Further investigation showed that the problem was NOT limited to hunk created jobs - it seemed to be on some other MapReduce jobs unrelated to Hunk. This is not a Hunk problem.

So - to answer my own question - it uses the default!

0 Karma

rdagan_splunk
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...