Getting Data In

Splunk not indexing modified files

phoenixdigital
Builder

I have a FTP data collector which pulls in files from an FTP server and dumps them into a directory monitored by Splunk.

The files are all of the IDA00*.dat files and are sourced from ftp://ftp2.bom.gov.au/anon/gen/fwo/

My script checks this ftp server about every 6 hours and if the modified date has changed on the files it will redownload them and replace them in /home/phoenix/data/bom/

Splunk is setup to monitor this directory with the following conf files

inputs.conf

[monitor:///home/phoenix/data/bom]
disabled = 0
followTail = 0
host = BOM
index = bom
crcSalt = <SOURCE>

props.conf

[source::...[/\\]bom[/\\]IDA00001.dat]
KV_MODE = none
SHOULD_LINEMERGE = false
sourcetype = bomIDA00001
REPORT-extractIDA00001 = IDA00001_Fields
priority = 100

priority 100 required as Splunk ignores .dat files by default. I have had to remove .dat from /opt/splunk/etc/default/props.conf as well recently as the priority stopped working for some reason and the data was being treated as binary (but thats for another topic)

transforms.conf

[IDA00001_Fields]
DELIMS = "#"
FIELDS = loc_id,location,state,forecast_date,issue_date,issue_time,min_0,max_0,min_1,max_1,min_2,max_2,min_3,max_3,min_4,max_4,min_5,max_5,min_6,max_6,min_7,max_7,forecast_0,forecast_1,forecast_2,forecast_3,forecast_4,forecast_5,forecast_6,forecast_7,dummy

Now this seemed to be working ok for a while but for some reason it has stopped indexing files even though new files are coming in with completely different data (in particular the forecast_date). I have can only see data in the index=bom from the 28th of Sept and back. It is the 29th and there should be data in Splunk for that.

Running the following returns some actions on the files in question

grep IDA00001.dat /opt/splunk/var/log/splunk/splunkd.log

09-29-2011 13:52:24.489 +1000 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO  WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/home/phoenix/data/bom/IDA00001.dat'.
09-29-2011 14:48:50.167 +1000 INFO  WatchedFile - Will begin reading at offset=0 for file='/home/phoenix/data/bom/IDA00001.dat'.

So it seems like Splunk is working on the files. Are they being indexed though as the data is not showing up?

Any help would be appreciated.

0 Karma
1 Solution

phoenixdigital
Builder

Something I just remembered about this issue.

The file had the extension .dat and this is classified as a binary file by one of the splunk configuration files.

We ended up removing it from /etc/system/default/props.conf under the stanza

[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

Obviously the correct way to do this would be to add this to your props.conf in your app which should override this default.

View solution in original post

0 Karma

phoenixdigital
Builder

Something I just remembered about this issue.

The file had the extension .dat and this is classified as a binary file by one of the splunk configuration files.

We ended up removing it from /etc/system/default/props.conf under the stanza

[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

Obviously the correct way to do this would be to add this to your props.conf in your app which should override this default.

0 Karma

Christian
Path Finder

i think you have to clean the _fishbuket index on the forwarder, that's the location were splunk stores the information which file is indexed or not

0 Karma

hbhatta
New Member

Hi, I am facing similar problem. Any resolution??

0 Karma

hbhatta
New Member

Hi, I am facing similar problem. Any resolution??

0 Karma

phoenixdigital
Builder

Unfortunately no. We have since moved on from this for now. If you do find a result please let us know here.

0 Karma

phoenixdigital
Builder

Unfortunately no after clearing monitored directory then clearing the indexes with the command

/opt/splunk/bin/splunk stop; /opt/splunk/bin/splunk clean eventdata -f -index bom; /opt/splunk/bin/splunk clean eventdata -f -index bom_summary; /opt/splunk/bin/splunk start

I retrieve the files again and Splunk shows zero events in the index.

0 Karma

Drainy
Champion

Is it possible the timestamping has changed? Just thinking it might be indexing the data but its been put with a different date/time to that which you are expecting

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...