Getting Data In

ArchiveProcessor - Bypassing normal system/local/props.conf processing for .dat files inside archives? (4.3.4)

Lucas_K
Motivator

I have a situation in which it would seem that for .dat files inside an archive I can not make it honor the settings listed in a system/local/props.conf.

Example.

We have the following 6 unique log files. Note: The example below is proof to myself of the issue and not how my real world sources are gathered. However, I have two customer installations both using the same data format (.dat files inside zip files) so it has a direct customer impact right now.

1.log
2.dat
3.zip (contains 3.log)
4.zip (contains 4.dat)
56.zip (contains 5.log and 6.dat)

All zip files are created using the same method. All file names are unique. All file events inside the files are unique.
The following app based inputs.conf

[monitor://c:\logs\]
index=logs
sourcetype=logs
followTail=0
alwaysOpenFile = 1
whitelist = \.dat$|\.log$|\.zip$
crcSalt = < SOURCE >

Due to the .dat being a known binary format I also use a sys/local/props.conf to stop the files from being ignored (as per http://splunk-base.splunk.com/answers/11118/how-to-monitor-datgz-files ).

[source::.+logs.+....(dat)]
sourcetype=logs
priority = 20

Now the weird thing is that 1.log,2.dat and 3.zip WILL be indexed correctly. The archive containing the .dat file will be ignored. So it seems the above stanza works fine for standalone .dat file not ones contained inside archives.

So I check splunkd.log for hints as to what is going on.

 14:50:32.615 +1000 INFO  ArchiveProcessor - reading path=c:\logs\56.zip (seek=0 len=1595)
10-03-2012 14:50:32.615 +1000 INFO  ArchiveProcessor - Archive with path="c:\logs\56.zip" was already indexed as a non-archive, skipping. 

So splunk believe's its seen the file before even though it hasn't. I can re-salt the files by renaming them and they will all be indexed again with the exception of any zip file with a .dat file inside.

This then leads to a post at the end of this OLD May 2011 thread (splunk v4.2 at the time) ( http://splunk-base.splunk.com/answers/24578/rolled-logs-compressed-immediately ).

Is there some magical setting inside the system/local/props.conf I need to set for sourcetype setting of .dat files INSIDE archives OR is this a known bug?

0 Karma
1 Solution

Lucas_K
Motivator

Update: I have found temporary solution to this!!! (its bad [wouldn't be surprised to see someone from splunk complain loudly NOT to do this] and will probably break your input again upon splunk upgrade but by then hopefully its fixed).

Simply edit the etc/system/default/props.conf and remove the dat from the "known_binary" stanza.

So just replace the following.

[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

with

 [source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

View solution in original post

0 Karma

Lucas_K
Motivator

Update: I have found temporary solution to this!!! (its bad [wouldn't be surprised to see someone from splunk complain loudly NOT to do this] and will probably break your input again upon splunk upgrade but by then hopefully its fixed).

Simply edit the etc/system/default/props.conf and remove the dat from the "known_binary" stanza.

So just replace the following.

[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary

with

 [source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...