I restarted Splunk and now I am missing all of my data before today (this data was loaded after I restarted I believe).
Can someone help me to understand what happend (or could have happend) here?
Everything seems to be owned correctlly:
[root@wnl-svr184b var]# ls -l /apps/wcm-splunk/var/lib/splunk
total 92
drwx------ 6 wcsplunku wcsplunku 4096 Nov 5 15:26 audit
-rw------- 1 wcsplunku wcsplunku 2 Dec 10 12:53 _audit.dat
drwx------ 2 wcsplunku wcsplunku 4096 Nov 5 15:26 authDb
drwx------ 6 wcsplunku wcsplunku 4096 Nov 5 15:26 blockSignature
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 _blocksignature.dat
drwx------ 6 wcsplunku wcsplunku 4096 Nov 7 08:20 charlesriver
-rw------- 1 wcsplunku wcsplunku 2 Dec 10 12:53 charlesriver.dat
drwx------ 6 wcsplunku wcsplunku 4096 Nov 7 14:39 defaultdb
drwx------ 8 wcsplunku wcsplunku 4096 Dec 10 14:28 fishbucket
drwx------ 2 wcsplunku wcsplunku 4096 Nov 5 15:26 hashDb
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 history.dat
drwx------ 6 wcsplunku wcsplunku 4096 Nov 5 15:26 historydb
-rw------- 1 wcsplunku wcsplunku 2 Dec 10 12:53 _internal.dat
drwx------ 6 wcsplunku wcsplunku 4096 Nov 5 15:26 _internaldb
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 main.dat
drwx------ 3 wcsplunku wcsplunku 4096 Dec 10 14:17 persistentstorage
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 summary.dat
drwx------ 6 wcsplunku wcsplunku 4096 Nov 5 15:26 summarydb
drwx------ 6 wcsplunku wcsplunku 4096 Nov 6 11:06 test
drwx------ 6 wcsplunku wcsplunku 4096 Nov 8 09:44 testapp
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 testapp.dat
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 test.dat
-rw------- 1 wcsplunku wcsplunku 1 Dec 10 12:53 _thefishbucket.dat
It appears that my index size was too small and the data was frozen (however I have no frozen directory configured). I ran this query to find that data had been frozen in the charlesriver index:
index=_internal source="/apps/wcm-splunk/var/log/splunk/splunkd.log" charlesriver freeze
It showed records such as this:
11-28-2013 04:03:28.156 -0500 INFO BucketMover - AsyncFreezer freeze succeeded for bkt='/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' 2013-11-28T04:03:28.156-0500 '/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' BucketMover 4 28 3 november 28 thursday 2013 -300 splunkd-log wnl-svr184b _internal 1 INFO AsyncFreezer freeze succeeded for bkt='/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' --_::._-____-_____='//-//////' /apps/wcm-splunk/var/log/splunk/splunkd.log splunkd wnl-svr184b 29 0
Now my question is, can I recover these or are they lost for good considering I have no directory configured for frozen data?
It appears that my index size was too small and the data was frozen (however I have no frozen directory configured). I ran this query to find that data had been frozen in the charlesriver index:
index=_internal source="/apps/wcm-splunk/var/log/splunk/splunkd.log" charlesriver freeze
It showed records such as this:
11-28-2013 04:03:28.156 -0500 INFO BucketMover - AsyncFreezer freeze succeeded for bkt='/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' 2013-11-28T04:03:28.156-0500 '/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' BucketMover 4 28 3 november 28 thursday 2013 -300 splunkd-log wnl-svr184b _internal 1 INFO AsyncFreezer freeze succeeded for bkt='/apps/wcm-splunk/var/lib/splunk/charlesriver/db/db_1385355600_1384810904_22' --_::._-____-_____='//-//////' /apps/wcm-splunk/var/log/splunk/splunkd.log splunkd wnl-svr184b 29 0
Now my question is, can I recover these or are they lost for good considering I have no directory configured for frozen data?
Important: don't run the commands below if you aren't sure what they do. You could end up changing owner:group permissions on your entire system which is a pain in the arse.
Without much info to go on...it sounds like you might have restarted splunk as the wrong user. Are you using Linux? I am and I've done this before. On my set-up I run Splunk as the user 'Splunk'. All the files & folders should be owned by this user.
I found out a few hours after IT restarted Splunk as 'root' user that something was wrong. I restarted via the command line and dictated which user (Splunk) it should run under:
sudo -H -u splunk /$splunk_home_directory$/bin/splunk restart
This didn't solve the issue completely because, after IT restarted Splunk0 as 'root', newly indexed data and other files were now owned by 'root'. The symptom was that after I restarted Splunk as user 'splunk', I could not see anything indexed while SPlunk was running under 'root' user. My data only showed events from the day before back.
To fix, I stopped Splunk and changed owner:group on ever single file and directory in the splunk home directory:
From the parent directory of the splunk home directory:
sudo chown splunk:splunk -R splunk/
Then I restarted again:
sudo -H -u splunk /$splunk_home_directory$/bin/splunk restart
For some reason this didn't change some files so I had to do a search for any files in the Splunk directory that weren't owned by splunk user. I manually ran chown against these files, restarted splunk correctly, and voila. Back to normal.
It appears all of the datafiles are owned appropriatly (see original post above, edited for ls -l).
is your splunk/var directory a mapped network drive or symlinked? When I chowned the first time it didn't hit the symlinked directory so I had to go into that directory and run the command.
This is what I was thinking as well. to I did manage to chown the directoriey correclty but when I restart I am still missing my data. Proving this may not be it I started Splunk as root and still am missing the data.