We have splunk running on a linux server and it keeps crashing due to low disk space. I've traced the culprit to the server not being out of space, but running out if inodes. There is an extraordinary large amount (hundreds of thousands) of session-***** and session-*****.lock files in /opt/splunk/var/run/splunk. They go back 2-3 months. Why isnt splunk purging these old files? Any advice?
Can post log files as requested.
Thanks all
Apart from just deleting these files did you find any follow up solution to this?
I've run into the same issue (v4.3.3).
Edit: Appears to have been a known bug -> http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/4.3.4 (SPL-48237)
Every request to splunkweb is creating a session lock file in var/run/splunk, ending up in DOS. (SPL-48237)
So the solution is to either upgrade to 4.3.4 OR implement a script to do a find -exec rm on files older than the last splunk restart date.
edit2: we've found that even when upgraded to 4.3.4 these files are still not purged correctly.
Remove tons of session files
ll | grep session | awk '{print "rm "$8" "}' | csh
Note:
"$8" -- column 8 of ll comand
Thanks
Sincerely
John Hsu
johnthsu@hotmail.com
ok just an update to this.
I've checked all our search head instances (both 4.3.3 and 4.3.4) and found large numbers of files in var/run/splunk (upwards of 3 million on some instances). We are however using F5 LTM in front of these. The F5's have health monitors set to check every 5 seconds. As such there are 24 new session files per minute (2 x F5 LTM's). These are pre-authenticated sessions so have very little information in them so its easy to distinguish from current user logins.
I now run the following scripts every 5 minutes. Once the files have been deleted search head web interface performance seems to increase also (no metrics on this however so maybe placebo effect? 😉 ). If you are going to use this check the session/lock files and see if you can figure out which ones are actually valid sessions and which are stale. Adjust the following find commands as required.
For Splunk 4.3.4
As 4.3.4 has the lock file fix we need to only clean up the remaining session files.
This command only deletes the very small session files (~60 bytes). All our users (except for admin account) connect using sso so the legit session files are quite large by comparison so this is good way to filter out bogus sessions.
find /opt/splunk/var/run/splunk/ -name 'session*' -mmin +60 -type f -size -65c -exec rm {} \;
For Splunk 4.3.3
There is no lock file fix so the script needs to delete these lock files also.
This script does the same as the 4.3.4 one and then finds the orphaned .lock files and deletes them also.
#!/bin/bash
#L.K Temp script to delete session files that are not cleaned up by splunk (see bug ref : SPL-48237)
#Modified for splunk 4.3.3
#The following will delete files older than 1 hour that are smaller than 65 bytes.
#ie. unauthenticated web logins (f5 polling script etc)
SESSION_PATH="/opt/splunk/var/run/splunk/"
#Delete only small session files but dont delete lock files as they *may* be tied to existing logged in sessions
/usr/bin/find $SESSION_PATH -name 'session*' ! -name 'session*.lock' -mmin +60 -type f -size -65c -exec rm -v {} \;
#Find orphan lock files without a matching session file.
files=$(/usr/bin/find $SESSION_PATH -name 'session*.lock' -mmin +60 -type f -size -65c )
for file in $files
do
filename=$(basename "$file")
extension="${filename##*.}"
filename="${filename%.*}"
file_path=$(dirname $file)
#Check if matching session file exists.
if [ ! -f $file_path/$filename ]
then
#echo " safe to delete lock file : $file"
/bin/rm -v $file
fi
done
I hope this is useful to someone that's run into the same issue.
Apart from just deleting these files did you find any follow up solution to this?
I've run into the same issue (v4.3.3).
Edit: Appears to have been a known bug -> http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/4.3.4 (SPL-48237)
Every request to splunkweb is creating a session lock file in var/run/splunk, ending up in DOS. (SPL-48237)
So the solution is to either upgrade to 4.3.4 OR implement a script to do a find -exec rm on files older than the last splunk restart date.
edit2: we've found that even when upgraded to 4.3.4 these files are still not purged correctly.