We have set up universal forwarder on one of the Linux servers, and it started forwarding events to Splunk, later we realized that there is a mismatch between Splunk event count actual raw log count of that server.
Flow as follows :
UF---->HF----->Indexers
UF version: 6.6.2
Inputs.conf:
[monitor:///var/log/XXXX/XXXXXXX]
index=XXXX
sourcetype=XXXXX
disabled = 0
Props.conf:
[sourcetype_name]
EXTRACT-XXX_7 = (?P\d+.\d+.\d+.\d+)\s-\s-\s-\s\[(?P\d+\/\w+\/\d+:\d+:\d+:\d+\s-\d+)\]\s"(?P\w+)\s\/\s(?P\w+\/\w+.\w+)"\s+(?P\d+)
LINE_BREAKER = ([\r\n]+)\d+.\d+.\d+.\d+\s
MAX_TIMESTAMP_LOOKAHEAD = 32
NO_BINARY_CHECK = true
TIME_FORMAT = %m/%b/%Y:%H:%M:%S -%z
TIME_PREFIX=\[
TZ = AMERICA/Chicago
SHOULD_LINEMERGE = false
Sample events:
XXX.XX.XX.XXX XXX.XX.XX.XXX - - [15/Jun/2019:11:28:55 -0400] "POST /XXX/XXX/$analyze HTTP/1.1" 202 355 "-" "-" 1 40110 "http://XXXX:5088"
What might be the possible reason ?
Please provide your insights ,TIA.
Are you using useAck
? It is one of those damned-if-you-do-damned-if-you-dont features. if you use it, make sure you set a huge output queue and wait time on the forwarder. Systemic indexing congestion can cause the loss of data, although splunk does try to turn off listening on the data listening port when indexing queues fill. It happens. Also, normal system failures can cause loss. Believe it or not, using this setting is more likely to cause lost events than not, because it causes so much extra work and slowdown.
Check index retention; some of your events may have already expired.
Also, check lag ( ... | eval lag = _indextime - _time
). I have seen many cases where the UF falls farther and farther behind because there are too many files to sort through.
Thanks @woodcock for quick reply,
i am not using "useAck"
i have checked and there is no blockage on indexers.
index retention period is 90 days
i found lag in single digits for impacted source path(but that is in milliseconds)