What search commands in Hunk kick off reducers vs. trying to collection data via a streaming session? I ask, since I looked at the search log for a query of search index=vir-test minutesago=60.
06-17-2015 21:58:42.616 INFO ERP.isa-prod - SplunkMR$SearchHandler - Reduce search: null
06-17-2015 21:58:42.617 INFO ERP.isa-prod - SplunkMR$SearchHandler - Search mode: stream
06-17-2015 21:58:42.617 INFO ERP.isa-prod - SplunkMR$SearchHandler - setting requiredFields=*
Based on the data, it appears that a streaming job was kicked off (not too fast). I have looked at
http://docs.splunk.com/Documentation/Hunk/6.2.3/Hunk/distributableandnondistributablesearchcommands , but it isn't clear as to which commands kick of a reducer.
A map-only MR job will be submitted to Hadoop when the search a) contains any reporting / transforming commands (assuming verbose mode is not in use) or b) the search contains filtering predicates
http://docs.splunk.com/Documentation/Splunk/6.2.3/Search/Aboutreportingcommands
For example:
index=vir-test <<-- that's just streaming data
index=vir-test error OR warn <<-- this should kick off a MR job
index=vir-test | stats count by my_field <<-- this should kick off a MR job
index=vir-test error OR warn
is intriguing.
I tried -
index=xxxx source = "*part-m-00078*" OR source = "*part-m-00079*"
I see the MapR job running, but the query runs at a very slow speed.
Weird thing.
RE: A map-only MR job will be submitted to Hadoop...
So Hunk will never kick-off a reducer job on the hadoop side?
That's correct. The reduce function happens on Hunk search head.