Currently, I have 2 seperate clusters. One 'old' 6.0 cluster, and a new cluster for 6.2.
The idea is to have our forwarders forwarding to both clusters at the same time. I modified the outputs.conf on the forwarders, and can see events coming in on both clusters. So far, so good.
When I take a closer look, I can see events dropping on most forwarders:
index=_internal sourcetype=splunkd "has begun dropping events"
I can't find the root cause of this. No queues are blocked, network seems to be ok, and the indexers (both clusters) are fine too. Also, when I look closer on the local queues, I cannot see any alarming levels as well. No throtteling either (no maxkbps messages)
index=_internal source="/opt/splunkforwarder/var/log/splunk/metrics.log" group=queue current_size_kb>0
Only message that occurs frequently is "File descriptor cache is full (100), trimming". For what I could find, it should be regarderd as an informational message, not really harming anything.
Who can help me out to find the actual bottleneck?
2nd such post with no resolution 😞
@renems , were you able to find any root cause with this???
Might be useful, the actual error msg:
06-10-2015 14:00:25.833 +0200 INFO TcpOutputProc - Queue for group splunknw has begun dropping events
06-10-2015 14:00:25.833 +0200 INFO TcpOutputProc - Queue for group splunknw has stopped dropping events
06-10-2015 14:00:34.829 +0200 INFO TcpOutputProc - Queue for group splunknw has begun dropping events