We're running Hunk v6.2.2 to connect to Hortonworks Hadoop v2.2.4.2 cluster. I noticed that there are automatic mapreduce jobs that fire off all the time from hunk without any user searches happening. I have 14 pending mapreduce jobs at the moment. How do I determine what in hunk is firing off these mapreduce jobs.
Hunk jobs will also run if you've installed any splunk apps that contain any scheduled searches with "index=[asterisk]". These will also search the hadoop virtual index. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils
The workaround is to change the scheduled searches in the splunk apps from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".
Hunk jobs will also run if you've installed any splunk apps that contain any scheduled searches with "index=[asterisk]". These will also search the hadoop virtual index. Here are the culprits in my environment:
splunk_app_microsoft_exchange
splunk_app_windows_infrastructure
sideview_utils
The workaround is to change the scheduled searches in the splunk apps from "index=[asterisk]" to "(index=[asterisk] AND NOT index=myhadoopprovider)".
Hunk will run jobs only if you schedule them (will start based on the schedule), report accelerate them (default to every 10 minutes), or if you manually running reports, dashboards, or searches.
So, Since you do not see any jobs that are starting from within your Hunk search head (Look for Jobs in the Hunk UI), are there any other Hunk search heads that connects to your Hortonworks cluster?
Ok, so I'm seeing this log entry in vix.splunk.home.hdfs/dispatch/scheduler_admin_c3BsdW5rX2FwcF9taWNyb3NvZnRfZXhjaGFuZ2U_RMD5f2faa9386d1f44b5_at_1433340300_2932/0/dispatch_dirs/SplunkMR_attempt_1433169589737_2064_m_000001_3/search.log
06-03-2015 10:10:03.000 ERROR dispatchRunner - RunDispatch::runDispatchThread threw error: Application does not exist: splunk_app_microsoft_exchange
We are also licensed for the Splunk App for Microsoft Exchange. I'm seeing all sorts of searches for "index=" in $SPLUNK_HOME/etc/apps/splunk_app_microsoft_exchange/default/savedsearches.conf. **Is this what's firing off all the mapreduce jobs???*
Here's a sample:
[Lookup - Database Information]
search = index=* eventtype=msexchange-database-stats | stats latest(Active) as Active,latest(MasterType) as MasterType by host,Database | eval key = host . "__" . Database | outputlookup dbInformation append=true
cron_schedule = 30 */4 * * *
dispatch.earliest_time = -8h
dispatch.latest_time = now
enableSched = true
run_on_startup = true
Yes, index=[star] would match any index that the user (in this case system) is allowed to search - except internal indexes (the ones that start with _, like _internal)
Can you please provide us with an example name of the pending jobs. Also can you please check if you have any report acceleration summary building jobs or saved searches with virtual index scheduled.
Here's the name of one job: SPLK_myhost.mydomain.ca_scheduler_admin_c3BsdW5rX2FwcF9taWNyb3NvZnRfZXhjaGFuZ2U_RMD5ba20a3b8ac6badf9_at_1433178480_338_0
I don't have any saved searches with the virtual index scheduled.
How do I check if I have any report acceleration summary building jobs?
That job seems to be from a scheduled report acceleration.
In Splunk Settings under Data section you can find the Report Acceleration Summaries, which will indicate the details. There is a delete option when clicked on each summary id, if you do not need the summaries anymore.
Also the details of each summary indicates corresponding save search under "Reports Using Summary" column.
I didn't see any Report Acceleration Summaries that referenced the virtual index. Just to be sure, I deleted all Report Acceleration Summaries. However, I'm still seeing mapreduce jobs in Hadoop. Any idea where to check next?