Hello,
I have an environment with 2 search heads and 2 indexers. There are 70ish forwarders which send around 50 MB data a day.
lsof -i :port | wc -l # shows established connections
70
On one search head there are 6 realtime searches, which can be seen on 'ps' screen
ps -Lef
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
However I see increasing number of splunkd threads, now sitting at number 39
ps -Lef | grep -v grep | grep "splunkd -p 8089" | wc -l
39
Furthermore there are couple of threads for mrsparkle
python -O /opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py restart
The problem is that Splunk starts using the whole memory. Mem Used Percentage graph can be seen here
( edit: For your information Indexers have 34 GB memory each )
You can see manual restarts, and forced ones when memory usage gets to 100% and splunk is killed because of oom.
All splunk instances have been updated to 4.3.6 and have Deployment Monitor App disabled.
Is there something else I can do to check what causes the memory leak ?
Answering my own question, showing all the steps I have done.
Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things
On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.
Answering my own question, showing all the steps I have done.
Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things
On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.
You memory usage patterns seem weird to me because you are processing virtually no data. I have over 50Gb coming in each day and only have 8Gb memory and it doesn't run out of memory.
Unless you are doing some massive complex processing on the inbound information it isn't normal to run out of memory with only 50Mb per day. I would say there is some sort of loop behaviour going on in your splunk infra-structure, but without knowing how things are setup and what you are doing with the data, it is quite hard to give you good guidance.
Thank you for your answer. Upgrading from 4.3.6 to 5.0.3 solved the problem