All Apps and Add-ons

How to correlate a mapreduce job to a hunk process?

suarezry
Builder

We're running Hunk v6.2.2 to connect to Hortonworks Hadoop v2.2.4.2 cluster. I noticed that there are automatic mapreduce jobs that fire off all the time from hunk without any user searches happening. I have 14 pending mapreduce jobs at the moment. How do I determine what in hunk is firing off these mapreduce jobs.

Tags (2)
0 Karma
1 Solution

suarezry
Builder

Hunk jobs will also run if you've installed any splunk apps that contain any scheduled searches with "index=[asterisk]". These will also search the hadoop virtual index. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils

The workaround is to change the scheduled searches in the splunk apps from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".

View solution in original post

suarezry
Builder

Hunk jobs will also run if you've installed any splunk apps that contain any scheduled searches with "index=[asterisk]". These will also search the hadoop virtual index. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils

The workaround is to change the scheduled searches in the splunk apps from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".

rdagan_splunk
Splunk Employee
Splunk Employee

Hunk will run jobs only if you schedule them (will start based on the schedule), report accelerate them (default to every 10 minutes), or if you manually running reports, dashboards, or searches.
So, Since you do not see any jobs that are starting from within your Hunk search head (Look for Jobs in the Hunk UI), are there any other Hunk search heads that connects to your Hortonworks cluster?

suarezry
Builder

Ok, so I'm seeing this log entry in vix.splunk.home.hdfs/dispatch/scheduler_admin_c3BsdW5rX2FwcF9taWNyb3NvZnRfZXhjaGFuZ2U_RMD5f2faa9386d1f44b5_at_1433340300_2932/0/dispatch_dirs/SplunkMR_attempt_1433169589737_2064_m_000001_3/search.log

06-03-2015 10:10:03.000 ERROR dispatchRunner - RunDispatch::runDispatchThread threw error: Application does not exist: splunk_app_microsoft_exchange

We are also licensed for the Splunk App for Microsoft Exchange. I'm seeing all sorts of searches for "index=" in $SPLUNK_HOME/etc/apps/splunk_app_microsoft_exchange/default/savedsearches.conf. **Is this what's firing off all the mapreduce jobs???*

Here's a sample:
[Lookup - Database Information]
search = index=* eventtype=msexchange-database-stats | stats latest(Active) as Active,latest(MasterType) as MasterType by host,Database | eval key = host . "__" . Database | outputlookup dbInformation append=true
cron_schedule = 30 */4 * * *
dispatch.earliest_time = -8h
dispatch.latest_time = now
enableSched = true
run_on_startup = true

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

Yes, index=[star] would match any index that the user (in this case system) is allowed to search - except internal indexes (the ones that start with _, like _internal)

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

Can you please provide us with an example name of the pending jobs. Also can you please check if you have any report acceleration summary building jobs or saved searches with virtual index scheduled.

0 Karma

suarezry
Builder

Here's the name of one job: SPLK_myhost.mydomain.ca_scheduler_admin_c3BsdW5rX2FwcF9taWNyb3NvZnRfZXhjaGFuZ2U_RMD5ba20a3b8ac6badf9_at_1433178480_338_0

I don't have any saved searches with the virtual index scheduled.

How do I check if I have any report acceleration summary building jobs?

0 Karma

apatil_splunk
Splunk Employee
Splunk Employee

That job seems to be from a scheduled report acceleration.
In Splunk Settings under Data section you can find the Report Acceleration Summaries, which will indicate the details. There is a delete option when clicked on each summary id, if you do not need the summaries anymore.
Also the details of each summary indicates corresponding save search under "Reports Using Summary" column.

0 Karma

suarezry
Builder

I didn't see any Report Acceleration Summaries that referenced the virtual index. Just to be sure, I deleted all Report Acceleration Summaries. However, I'm still seeing mapreduce jobs in Hadoop. Any idea where to check next?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...