Hi,
I'm running into occasional errors from one of my indexers reporting "skipped indexing of internal audit event will keep dropping events until indexer congestion is remedied. Check disk space and other issues that may cause indexer to block."
I've run the following to monitor for any high values for the queues and don't see anything really actionable during timeframes I see the messages:
index="_internal" source="*metrics.log" group="queue" earliest=-4h | timechart max(current_size) span=30m by name
Checked for any forwarders flooding my indexer and nothing was obvious. So, nothing really actionable.
According to SPL-37407, this is a known issue in 4.2.1 "most often tcpout-queue", but there's no real info on how to get it addressed. in fact, that's the only place the tcpout-queue is mentioned. So, got some questions:
Thanks!
tom
Good advice : install the SOS app on the indexer and check the indexing performance.
If the queues are full, then this can be :
And remember that at one point, you will need more than 1 indexer to scale your volume.
@yannK , is it also possible for the congestion to occur due to a lot of searches targeting the indexer. We have premium apps (ITSI/ES) enabled in our environment. Could that be the case too ?
You could try the "Splunk on Splunk" App, http://apps.splunk.com/app/748
It will provide you a good overview of what's happening on your indexer.