It seems we are having several issues with our Splunk servers/architecture and I wanted to know if anyone else has had issues. If so, were you able to get them fixed? To give you an idea of our layout, we have two indexers and two search heads (all four are big physical boxes, Windows OS). We have 10 heavy forwarders spread round the world (these are virtual boxes, Windows OS and 1 Linux). We also have several hundred universal forwarders (Windows OS) sending data to the heavy forwarders for filtering. We are now on Splunk 6.0.x, but that hasn’t helped and I’m wondered if it might have hurt things. The kinds of issues we are having are:
I’d just like to know if anyone else is having stability issues besides us. I thought that Splunk was supposed to be one of those rock solid applications that just runs, but we haven’t seen that. Maybe if it was running on Linux, but we don’t have that option.
I haven't experienced this particular problem, but I have had a similar one in the past that took weeks to track down. (if only we had splunk back then... )
We had a firewall in place that would, after a period of inactivity, stop forwarding data on a given open port. To the client it looked like it was sending data, on the server it looked like no data was being sent. It was only when we correlated the lost connection with a 30 minute inactivity period that we were able to figure things out. We had assumed that a firewall issue would mean a blocked port, not a non-forwarded port.
The final solution for us was to use TCP Keep-alives configured at the OS level. As a temporary solution, we wrote a small script that generated a small amount of activity every 29 minutes.
I'm not saying this is your problem, but it's worth spending a few minutes looking into.
Sounds like either, yes, Windows is being its usual flaky self (he says with admitted prejudice), or you have a recently introduced network architecture or infrastructure problem.
Yes. Sometimes its a connection failure, other times it shows nothing in the logs. I mainly wanted to see if anyone else was having issues with their Splunk instance.
Have you checked for errors in the various splunkd logs?