I have a situation in which it would seem that for .dat files inside an archive I can not make it honor the settings listed in a system/local/props.conf.
Example.
We have the following 6 unique log files. Note: The example below is proof to myself of the issue and not how my real world sources are gathered. However, I have two customer installations both using the same data format (.dat files inside zip files) so it has a direct customer impact right now.
1.log
2.dat
3.zip (contains 3.log)
4.zip (contains 4.dat)
56.zip (contains 5.log and 6.dat)
All zip files are created using the same method. All file names are unique. All file events inside the files are unique.
The following app based inputs.conf
[monitor://c:\logs\]
index=logs
sourcetype=logs
followTail=0
alwaysOpenFile = 1
whitelist = \.dat$|\.log$|\.zip$
crcSalt = < SOURCE >
Due to the .dat being a known binary format I also use a sys/local/props.conf to stop the files from being ignored (as per http://splunk-base.splunk.com/answers/11118/how-to-monitor-datgz-files ).
[source::.+logs.+....(dat)]
sourcetype=logs
priority = 20
Now the weird thing is that 1.log,2.dat and 3.zip WILL be indexed correctly. The archive containing the .dat file will be ignored. So it seems the above stanza works fine for standalone .dat file not ones contained inside archives.
So I check splunkd.log for hints as to what is going on.
14:50:32.615 +1000 INFO ArchiveProcessor - reading path=c:\logs\56.zip (seek=0 len=1595)
10-03-2012 14:50:32.615 +1000 INFO ArchiveProcessor - Archive with path="c:\logs\56.zip" was already indexed as a non-archive, skipping.
So splunk believe's its seen the file before even though it hasn't. I can re-salt the files by renaming them and they will all be indexed again with the exception of any zip file with a .dat file inside.
This then leads to a post at the end of this OLD May 2011 thread (splunk v4.2 at the time) ( http://splunk-base.splunk.com/answers/24578/rolled-logs-compressed-immediately ).
Is there some magical setting inside the system/local/props.conf I need to set for sourcetype setting of .dat files INSIDE archives OR is this a known bug?
Update: I have found temporary solution to this!!! (its bad [wouldn't be surprised to see someone from splunk complain loudly NOT to do this] and will probably break your input again upon splunk upgrade but by then hopefully its fixed).
Simply edit the etc/system/default/props.conf and remove the dat from the "known_binary" stanza.
So just replace the following.
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
with
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
Update: I have found temporary solution to this!!! (its bad [wouldn't be surprised to see someone from splunk complain loudly NOT to do this] and will probably break your input again upon splunk upgrade but by then hopefully its fixed).
Simply edit the etc/system/default/props.conf and remove the dat from the "known_binary" stanza.
So just replace the following.
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|dat|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary
with
[source::....(0t|a|ali|asa|au|bmp|cg|cgi|class|d|deb|del|dot|dvi|dylib|elc|eps|exe|ftn|gif|hlp|hqx|hs|icns|ico|inc|iso|jame|jin|jpeg|jpg|kml|la|lhs|lib|lo|lock|mcp|mid|mp3|mpg|msf|nib|o|obj|odt|ogg|ook|opt|os|pal|pbm|pdf|pem|pgm|plo|png|po|pod|pp|ppd|ppm|ppt|prc|ps|psd|psym|pyc|pyd|rast|rb|rde|rdf|rdr|rgb|ro|rpm|rsrc|so|ss|stg|strings|tdt|tif|tiff|tk|uue|vhd|xbm|xlb|xls|xlw)]
sourcetype = known_binary