Getting Data In

Indexing some, but not all files

charlesm
Explorer

I know there are similar questions, but not exactly and the answers don't seem to apply. Also, I'm a noob so forgive me if my terminology isn't exactly correct.

I have a forwarder running on Server1 and an indexer on Server2.

My inputs.conf is simple and looks like:

[monitor:///opt/mypath/*.log.2011-04-*]

It looks like that because I'm evaluating Splunk and to keep below 500MB I can only index this month's log files.

my files are log4j type outputs and have a naming convention as such: source-version.log.yyyy-mm-dd

where source = name of software producing the file, version = release version of the source (ex. 1.8.1, 1.10, etc). The rest, I'm sure, is self explanatory.

The behavior I'm seeing is that some files are being indexed, and others are not. I can't find a consistent pattern.

for instance:

  • some, but not all of sourceA files are being indexed.
  • 1 of sourceB files are being indexed
  • none of sourceC files are being indexed

I added crcSalt=<SOURCE> to my inputs.conf file, and it resulted in a small increase in files being read (which came through as a single event, another problem) but I'm still missing nearly half of them.

I've looked in splunkd.log (on both servers) but didn't see anything referencing the missing file names.

Some background:

orginally I tried indexing the entire directory, but it quickly breached 500MB, so I had to do a ./splunk clean eventdata.

also, I created my own datetime.xml so I can derive the date from the filename, which I reference in a local props.conf file.

datetime.xml:

<datetime>
   <define name="_fndate" extract="year, month, day">
      <text><![CDATA[source::.*?\\*.log.\d{4}-\d{2}-\d{2}]]></text>
   </define>
</datetime>

props.conf:

[my_sourcetype]
DATETIME_CONFIG = datetime.xml

It seems unlikely that this would cause problems, as other files with the same naming convention, and in some cases same log format, get indexed.

Also, if I get rid of both props.conf and datetime.xml from my local directory and restart the forwarder there is no change.

Thanks for reading!

0 Karma
1 Solution

charlesm
Explorer

OK, it occurred to me to try to clean the indexes on the forwarder, not just the indexer. After discovering 'clean eventdata' doesn't work for forwarders, I did some googling and learned 'clean all'.

This fixed it. Thanks gkanapathy! If I didn't go through the steps in my comment in order to answer your questions, I wouldn't have discovered this...

View solution in original post

charlesm
Explorer

OK, it occurred to me to try to clean the indexes on the forwarder, not just the indexer. After discovering 'clean eventdata' doesn't work for forwarders, I did some googling and learned 'clean all'.

This fixed it. Thanks gkanapathy! If I didn't go through the steps in my comment in order to answer your questions, I wouldn't have discovered this...

gkanapathy
Splunk Employee
Splunk Employee

Actually, you can index any amount of data. The 500 MB is a daily limit. Furthermore, under the trial license (and the free license) you can go over the limit either 3 times or 5 times in 30 days, so you actually could probably index all your data if you do it all in 3 days.

So don't add crcSalt to your inputs. Just use the direct inputs.

Splunkd.log won't list out ignored files at the default debug level. Are you sure it's the forwarder that is stopped, rather than say the indexer hitting the minimum free disk space? (Every volume that Splunk might write to must have 2000 free MB, or it stops indexing.) If you suspect the forwarder and file monitor, you can debug/monitor that with: http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

It would be helpful also to know:

* how many folders
* how many files
* how many actively written files
* approximately how many files by size ranges

if you believe it to be an input problem.

0 Karma

charlesm
Explorer

Thanks for your response!

I'm not entirely sure it's an input problem. I removed crcSalt from inputs.conf, did another clean eventdata and restarted the forwarder so that I could give you accurate counts, but only newly created files made it to the index.

based on some reading, could this be related to the fishbucket? The "clean eventdata" output indicated it was cleaned up, though...

new counts:
2/53 files indexed from 1 folder. (before was 9, 20 with crcSalt)
None are actively being written
3,699 events (758 and 2941)
files sizes are 3MB and 4.5MB.
The other 51 are from 1 to 5MB in size.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...