Hello,
I have a test environment and the SHC members aren't allocated the recommended resources (because it's test) however i haven't had any issues with the environment until recent. For whatever reason in my test environment the 3 node SHC members keep getting shut down because of signal 9 (the server itself is killing the splunk process) Signal 9 is a KILL signal from an external process. The server is running out of memory, and thats the cause for the kill
If i restart the SHC members the resources are freed but the spiral starts over once again.
screenshot that shows the decline, something is eating away at it.
When i run the top command on the searcheads and press e to change the unit i can see it's splunk mongod that's taking up most of the mem so far.
I also will have replication issue every now and again, where i have to resyc.
The solution to the OOM killer is to add more memory. Just because a system is a test system doesn't mean you can deprive it of the resources it needs.