We have 3 search heads and they are in cluster.We are observing scheduled reports with zero values for few reports.zero value reports are generating from search head 3.Issue is not consistent.
we have one main search which will run in every 15 mins and 20 sub reports which use the main search via loadjob. these reports run every 15 mins ie 3 mins after the main search.so now few reports are delivered recepents with zero values from the search head 3.
i was seeing scheduler log usually runtime is 0.3 sec for a successful reports but for failed reports it run time showing 300sec.
Can anyone please help me to understand how i need to trouble shoot this?
Right so - as I said:
Below is the job inspector of zero values report. can you please check?
02-11-2020 10:48:02.191 INFO SearchOperator:loadjob - triggering artifact replication uri=https://127.0.0.1:8089/services/search/jobs/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXI_RMD506ccb4aefc833dac_at_1581417900_15508_638683B3-25D9-4D2A-AF2E-4E43362FDBFA/proxy?output_mode=json, uri_path=/services/search/jobs/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXIRMD506ccb4aefc833dac_at_1581417900_15508_638683B3-25D9-4D2A-AF2E-4E43362FDBFA/proxy?output_mode=json
02-11-2020 10:53:02.257 INFO SearchOperator:loadjob - shc waited for artifact replication time_elapsed=300, success=0
02-11-2020 10:53:02.258 INFO SearchOperator:loadjob - Skip loading from non-existent file="/opt/splunk/var/run/splunk/dispatch/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXIRMD506ccb4aefc833dac_at_1581417900_15508_638683B3-25D9-4D2A-AF2E-4E43362FDBFA/results.srs.gz"
02-11-2020 10:53:02.258 INFO DispatchExecutor - END OPEN: Processor=inputlookup
02-11-2020 10:53:02.258 INFO DispatchExecutor - BEGIN OPEN: Processor=stats
02-11-2020 10:53:02.268 INFO SearchOperator:inputcsv - sid:scheduler_ZXhwb3N1cmVsYXllcl9yZXBvcnRpbmc_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXIRMD50a40cfc998311a8d_at_1581418080_15759_644D578C-F001-4711-B459-2338E22DF399 Successfully read lookup file '/opt/splunk/etc/apps/ee_search_exposurelayer/lookups/EL_domains_services.csv'.
You have artifact replication problems on your search head. In the case that the replication fails, the job times out, so the search does not complete.
You need to address the fundamental problem in your SHC, as this is causing the problem you have asked here.
Seems like that job artifact may have expired during those runs and loadjob may be failing (to retrieve that expired job) with 0 reports.
Can you consider setting up summary indexing for the main search (if space is a problem, you can create a new index with smaller retention. Your sub reports will then query the summary index data as regular search and they'll be much more consistent.
If the amount of data that main search generates is smaller, you can consider creating a lookup table as well instead of summary indexing. (summary indexing will give you historical data as well, so that's a plus)
Thanks for the reply.main search itself using summary indexing
What does the job inspector say for the failed (empty) scheduled search?
How to check that..?
In the UI. Activity>Jobs
Find the job returning 0 results, from the Job Menu select "Inspect Job"
this is not consistent and it happened almost 7 hours back and also we observed this issue yesterday as well..
out of 20 reports only 5 reports delivered with zero value
Below is the job inspector of zero values report. can you please check?
02-11-2020 10:48:02.191 INFO SearchOperator:loadjob - triggering artifact replication uri=https://127.0.0.1:8089/services/search/jobs/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbG..., uri_path=/services/search/jobs/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXI_RMD506ccb4aefc833dac_at_1581417900_15508_638683B3-25D9-4D2A-AF2E-4E43362FDBFA/proxy?output_mode=json
02-11-2020 10:53:02.257 INFO SearchOperator:loadjob - shc waited for artifact replication time_elapsed=300, success=0
02-11-2020 10:53:02.258 INFO SearchOperator:loadjob - Skip loading from non-existent file="/opt/splunk/var/run/splunk/dispatch/scheduler_c3ZjX3N1bW1hcmlzZXI_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXIRMD506ccb4aefc833dac_at_1581417900_15508_638683B3-25D9-4D2A-AF2E-4E43362FDBFA/results.srs.gz"
02-11-2020 10:53:02.258 INFO DispatchExecutor - END OPEN: Processor=inputlookup
02-11-2020 10:53:02.258 INFO DispatchExecutor - BEGIN OPEN: Processor=stats
02-11-2020 10:53:02.268 INFO SearchOperator:inputcsv - sid:scheduler_ZXhwb3N1cmVsYXllcl9yZXBvcnRpbmc_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXIRMD50a40cfc998311a8d_at_1581418080_15759_644D578C-F001-4711-B459-2338E22DF399 Successfully read lookup file '/opt/splunk/etc/apps/ee_search_exposurelayer/lookups/EL_domains_services.csv'.
02-11-2020 10:53:02.272 INFO DispatchExecutor - END OPEN: Processor=stats
02-11-2020 10:53:02.272 INFO DispatchExecutor - BEGIN OPEN: Processor=eventstats
02-11-2020 10:53:02.272 INFO DispatchExecutor - END OPEN: Processor=eventstats
02-11-2020 10:53:02.272 INFO DispatchExecutor - BEGIN OPEN: Processor=table
02-11-2020 10:53:02.273 INFO DispatchExecutor - END OPEN: Processor=table
02-11-2020 10:53:02.273 INFO DispatchExecutor - BEGIN OPEN: Processor=sort
02-11-2020 10:53:02.273 INFO DispatchExecutor - END OPEN: Processor=sort
02-11-2020 10:53:02.273 INFO DispatchExecutor - BEGIN OPEN: Processor=noop
02-11-2020 10:53:02.273 INFO DispatchExecutor - END OPEN: Processor=noop
02-11-2020 10:53:02.300 INFO SearchStatusEnforcer - Triggered listener notification on state onResults
02-11-2020 10:53:02.301 INFO ReducePhaseExecutor - Ending phase_1
02-11-2020 10:53:02.301 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.301 INFO DispatchManager - DispatchManager::dispatchHasFinished(id='scheduler_ZXhwb3N1cmVsYXllcl9yZXBvcnRpbmc_ZWVfc2VhcmNoX2V4cG9zdXJlbGF5ZXI_RMD50a40cfc998311a8d_at_1581418080_15759_644D578C-F001-4711-B459-2338E22DF399', username='exposurelayer_reporting')
02-11-2020 10:53:02.301 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.303 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.304 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.304 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.304 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.304 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.304 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
02-11-2020 10:53:02.305 INFO UserManager - Unwound user context: exposurelayer_reporting -> NULL
I have commented on a few of your recent questions:
Indexer Problems:
https://answers.splunk.com/answers/798224/we-see-this-error-message-search-peer-usadc-xxxxx.html
SHC Problems:
https://answers.splunk.com/answers/801804/artifactreplicator-connection-failed.html
both of them appear to remain unresolved.
Does this question relate to the same deployment?
If so, I would suggest you need to tackle both of your other issues first.
Problems with indexer replication AND / OR problems with artifact replication in an SHC could account for this type of problem.
Indexer Problems:
https://answers.splunk.com/answers/798224/we-see-this-error-message-search-peer-usadc-xxxxx.html -- resolved by rebooting the servers
https://answers.splunk.com/answers/801804/artifactreplicator-connection-failed.html--this is not impacting the performance as these eerrors existing from last 6 months may be more.
above question is my actual problem
this is a different problem started yesterday.Splunk version is 7.2.4