We are getting a random false alert from a Splunk (6.5.2) search that's looking for if a certain string is not found in a log file within the last 15m.
When we did an investigation and tried to search, the string was there for the alert period, so it shouldn't have triggered any alert.
We couldn't find any relevant error in the splunkd log on the forwarder, but I did notice the two consecutive entries on the metrics.log:
1/25/19 4:55:01.800 PM 01-25-2019 16:55:01.800 +1100 INFO Metrics - group=per_source_thruput, series="/XXX/systemerr.log", kbps=10.196221, eps=0.193555, kb=316.072266, ev=6, avg_age=1389.166667, max_age=1667
1/25/19 4:22:59.801 PM 01-25-2019 16:22:59.801 +1100 INFO Metrics - group=per_source_thruput, series="/XXX/systemerr.log", kbps=6.268667, eps=0.161285, kb=194.334961, ev=5, avg_age=211.600000, max_age=265
We got the false alert around 4:54, so if I understand correctly by looking at the time gap and the "avg_age" value, it might be possible that the alert was triggered because the data was only being read after 4:55; there was no update (new lines) on the file from 4:22 until 4:55.
So the question is, is my understanding correct? Is the problem caused by a delay in writing the data in the source log file, or is it because of a processing delay in Splunk itself?
Appreciate any advise,
... View more