hi
in splunkd.log and crash.log
this log are full
then splunkd down...
What does this mean?
crash.log
(Out of file descriptors!)
[build 119532] 2012-04-26 18:40:56
File descriptors open:
0: /opt/splunk/var/log/splunk/crash-2012-04-26-18:40:56.log
1: /opt/splunk/var/log/splunk/splunkd_stdout.log
2: /opt/splunk/var/log/splunk/splunkd_stderr.log
3: /opt/splunk/var/log/splunk/splunkd.log
4: socket:[40632058]
5: socket:[40632059]
6: socket:[40632060]
(...snipped...)
1020: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1939/Strings.data
1021: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1940/SourceTypes.data
1022: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1940/Strings.data
1023: /data/splunk/var/lib/splunk/cd_os/db/hot_v1_1941/SourceTypes.data
(Total 1024)
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 6885 running under UID 0.
Crashing thread: indexerPipe
Registers:
RIP: [0x00000030D1830265] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x0000000000001AE5]
RSI: [0x0000000000001AEE]
RBP: [0x000000004208E940]
RSP: [0x000000004208DB08]
RAX: [0x0000000000000000]
RBX: [0x000000004208DBB0]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x0000000000000080]
R9: [0x0101010101010101]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00007FFFD1786A1A]
R13: [0x0000000001184250]
R14: [0x0000000000000327]
R15: [0x00000000011839D0]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
Linux / splunkindex1 / 2.6.18-194.el5 / #1 SMP Tue Mar 16 21:52:39 EDT 2010 / x86_64
splunkd log
04-28-2012 12:01:38.875 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_5861/.rawSize": No such file or directory
04-28-2012 12:01:38.875 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=5861
04-28-2012 12:01:38.878 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6356/.rawSize": No such file or directory
04-28-2012 12:01:38.878 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6356
04-28-2012 12:01:38.880 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6701/.rawSize": No such file or directory
04-28-2012 12:01:38.880 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6701
04-28-2012 12:01:38.881 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_6987/.rawSize": No such file or directory
04-28-2012 12:01:38.881 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=6987
04-28-2012 12:01:38.882 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_7155/.rawSize": No such file or directory
04-28-2012 12:01:38.882 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=7155
04-28-2012 12:01:38.884 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_7353/.rawSize": No such file or directory
04-28-2012 12:01:38.884 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=7353
04-28-2012 12:01:38.887 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8029/.rawSize": No such file or directory
04-28-2012 12:01:38.888 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=8029
04-28-2012 12:01:38.889 +0900 INFO timeinvertedIndex - Unable to read raw size file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8357/.rawSize": No such file or directory
04-28-2012 12:01:38.890 +0900 ERROR DatabaseDirectoryManager - Unable to get size on disk for bucket id=8357
04-28-2012 12:01:38.896 +0900 INFO HotDBManager - index=dh_os No hot found for event ts=1334372461, closest match=null [expanded span=0] hotbucketsize=87 numbucks=1 maxhot=3
04-28-2012 12:01:38.896 +0900 INFO databasePartitionPolicy - creating new bucket /data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643
04-28-2012 12:01:38.896 +0900 ERROR JournalSlice - Cannot create new journal slice file: Too many open files, file="/data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643/rawdata/0"
04-28-2012 12:01:38.896 +0900 ERROR JournalSlice - Failed to write header for rawdata
04-28-2012 12:01:38.896 +0900 INFO HotDBManager - index=dh_os No hot found for event ts=1334372461, closest match=null [expanded span=0] hotbucketsize=87 numbucks=1 maxhot=3
04-28-2012 12:01:38.896 +0900 FATAL HotDBManager - hot dir with id already exists in createDir: /data/splunk/var/lib/splunk/dh_os/db/hot_v1_8643
As it says at the very top, you are out of file descriptors. You need to increase the number of file descriptions available, preferably to "unlimited", possibly using the ulimit
command, or by contacting your system administrator.
By the way, it was probably unhelpful to simply paste in over a thousand lines of text into a discussion forum where you are asking people to volunteer help, without taking some time to try to filter even a little bit for relevance, or ask if it would be useful.
my openfiles vaule is 4,096. (soft and hard)
openfiles improvement happens when you change the value to 10240?
But really ulimits problem?
Here is your friend for those cases:
http://splunk-base.splunk.com/answers/13313/how-to-tune-ulimit-on-my-server
Unix-like operating systems have a limit on the number of open files that a single process can have. In your case, RHEL5 defaults to 1024 per processs. A Splunk indexer needs several file descriptors for each open index bucket as well as one descriptor per connected forwarder. It is easy to run out of 1024. You will need to scale this value appropriately to the workload you are trying to run. This doc is helpful, http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html/Performance_Tuning_Guide/system-... even if specific to RH Directory Server.