In our environment, we have syslog servers that send data to regional Heavy forwarders. The data in HFs eventually gets indexed and is searchable on Search Heads.
The issue now is, we are able to see data(logs) on HFs. But we are not able to see them on Search Heads.
Eg : The last log present on HF for a particular host is on 30th May. But the last log we can see on our Search Head for the same host will be of 27th or 28th May's. We will be able to see 30th logs, somewhere around June 1st or 2nd.
It is obvious there is some latency between HF and Indexer. It is mostly because of the bandwidth issues (confirmed).
But I would like to get a report from Splunk that gives us the time difference between the moment a log got into HF and the moment it got indexed. Is there any SPL for getting this report?
Thanks in advance.
Assuming that there is little-to-no latency in the arrival of the event at the HF (e.g. the timestamp in the event is very close to the time that it arrives at the HV), then you can chart _indextime - _time
. So you can do something like this:
... | eval latencySeconds = _indextime - _time | timechart max(latencySeconds) avg(latencySeconds) BY sourcetype
You can change sourcetype
to splunk_server
, or host
or whatever to research the dependant variable. You might also check out the Meta Woot!
app that does some of this, too:
There are some other (and some better) ways to search this out. In addition to the accepted answer above, try these searches:
| tstats count where index=* by index sourcetype source _time _indextime
| eval latencySeconds =_indextime - _time
| stats avg(latencySeconds) AS latencySeconds BY index sourcetype source
| where latencySeconds < 0
Also:
| tstats min(_time) AS early max(_time) AS late
WHERE index=*
BY host
| eval diff = late - early
| where early != late
Of course, Martin Mueller always has a much faster way to gauge Indexing lag:
https://answers.splunk.com/answers/232475/how-to-search-when-an-event-was-indexed.html
At the very bottom - the key is to reduce cardinality of _time and only look for the worst case per bucket so to quickly get a general overview of your indexing delay, consider something tstatsy like this:
| tstats max(_indextime) as max where index=foo by host _time span=1s
| eval delta = max - _time
| timechart max(delta) by host
It introduces half a span of error if you want averages, but great to detect peaks.
Assuming that there is little-to-no latency in the arrival of the event at the HF (e.g. the timestamp in the event is very close to the time that it arrives at the HV), then you can chart _indextime - _time
. So you can do something like this:
... | eval latencySeconds = _indextime - _time | timechart max(latencySeconds) avg(latencySeconds) BY sourcetype
You can change sourcetype
to splunk_server
, or host
or whatever to research the dependant variable. You might also check out the Meta Woot!
app that does some of this, too:
Hi Woodcock,
Here we are going with an assumption that there is little or no latency in the arrival of event at HF. Is there a way we can get that latency too??
So in a picture format it will be..
Endpoint (event generated) Time T1, Heavy Forwarder (the same event reached HF) Time T2, Indexer (when that same event was indexed) Time T3.
So what we need is
T2 – T1 = time taken to reach HF
T3 – T2 = time taken to get the event indexed
T3 – T1 = total time taken for the event to be usable.
When we get the above information for each endpoint (only sample) we will be able to get to the bottom of the problem.
Then we have to go and dig deeper to find out if where the problem is:
1. HF is retransmitting or
2. indexer queues are full or
3. we are running out CPU or
4. we are wasting time on reading and writing from the disks on the HF
Thanks for your help in advance.
It would be best to ask a new question for this.
Hi Woodcook,
Can I ask why you never really answered the core question in stead of asking him to add a new question?
His question was super relevant.
Thank you woodcock for helping me out with the SPL 🙂