I have 1000+ json files located in a directory and those files will be overwritten by every day. the file name starting with same characters as shown below,
1000010496,1000011820,1000013553,1000010097,1000010362...
my issue is that splunk forwarder is not reading all the files. I have tried flushing fishbucket,deleted indexed data,crcSalt,adding timestamp in filename and none of this have helped me to get entire data. even very less count of source files are showing in splunk. how to read this 1000+ files repeatedly without missing data?
json files starts like below,
$result = [
{
'advisory_type' => 'Security Advisory',
'date' => '10/12/17',
'advisory_name' => 'CL-SA-2017:0061',
} ....
....
Thanks in advance.
The problem is that you have too many files/directories to sort through and splunk is getting bogged down tracking everything. You need to make sure that there is a housekeeping process ( logrotate
can do this ) that is deleting the older log files so they do not hang around "forever". This will only get worse. Splunk forwarders start to really bog down when having to track and sort through thousands of files and once you cannot make the rounds before you are scheduled to go back around and check (I have no idea what the numbers are for this), then you are in a never-ending cycle of fail and ever-worsening delays. Also, check your inodes
; you need user splunk
to be ulimit unlimited
.
I find these helpful for up-setting the limits. For RHEL6 and earlier:
cd /etc/security
cat >>limits.conf <<EOF
* hard nofile 102400
* soft nofile 10240
* hard nproc 16384
* soft nproc 16384
EOF
And for RHEL7+:
mkdir -p /etc/systemd/system/splunk.service.d
cat >> /etc/systemd/system/splunk.service.d/filelimit.conf <<EOF
[Service]
LimitNOFILE=10240
EOF
Reboot afterwards.
These can be found around Answers and Docs, but for quick reference here I've provided them. Other versions of Linux will vary, but these are typical for most people to use. Check your version to ensure that these would work for you!!
After adding "initCrcLength=1048576" this issue got resolved but when sources got overwritten, the unique source count got reduced in search head.
Have you ever had all of them indexed (like on the initial start of the forwarder, not just re-reading the files after they are updated)?
no, at the time of first indexing, splunk dint read all the files. it listed only 356 sources instead of 1300 sources..
Do you get a full list of the files when you run this on the forwarder?:
splunk list monitor
yes, am getting full list but in search head getting 229 unique sources only. i think splunk will be monitoring the paths specified in monitoring and its not reading files to avoid re-indexing same filename or content.
anyone have a solution, please post it..
If you have purchased Splunk and have a valid support contract, I'd submit a case to Splunk support.
Also, if you are not running the latest version of Splunk, you may want to upgrade.
And finally, if there are empty JSON files, they will not show up in the indexers on in searches because there is no data to index. Check for empty files.