Getting Data In

Avoid duplicate data and ignore # fields

kmattern
Builder

I have customer systems that log data to IIS on file transfers. IIS has a timeout of 20 minutes. When it times out it immediately restarts but throws in a new set of headers. Also the date/time stamp on the log changes and Splunk assumes that it is a new file.

How can I avoid the duplication of data when Splunk attempts to re-index the log or how do I get Splunk to only consume the new data? And how do I ignore the headers scattered throughout the log file?

0 Karma

wsnyder2
Path Finder

We use the following line in the sourcetype stanza for iis in the props.conf file.

SEDCMD-THROWAWAY-COMMENTS=s/^#.+[\r\n]+#.+[\r\n]+#.+[\r\n]+#.*[\r\n]//g

0 Karma

ogdin
Splunk Employee
Splunk Employee

Use INDEXED_EXTRACTIONS=W3C in Splunk 6. We will honor the header found at the top of the file and ignore any line beginning with a # after that. Plus, we do the field extraction automatically from the header so you don't have to mess with props and transforms.

http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime

0 Karma

lukejadamec
Super Champion

There are two problems here. First, you can remove the extra header lines with additions to inputs.conf, props.conf, and transforms.conf.

Note: I’m using a new sourcetype, so I need a stanza in inputs.conf. If you want to use the existing sourcetype in inputs.conf, then you will need to specify that sourcetype in props.conf (i.e. substitute my winIIS with the sourcetype found in your inputs.conf).

inputs.conf

[monitor://c:\inetpub\logs\Logfiles\W3SVC1\*.log]
sourcetype = winIIS
queue = parsingQueue
index = default
disabled = false

props.conf

[winIIS]
SHOULD_LINEMERGE = false
CHECK_FOR_HEADER = false
REPORT-fields = windows_iis_header
TRANSFORMS-headers = remove_headers

transforms.conf

[remove_headers]
REGEX = ^#.*
DEST_KEY = queue
FORMAT = nullQueue

[winIIS]
FIELDS = “date”,”time”,”s_ip”,….. you need to complete the list with your log header configuration.
DELIMS = “ ”

Here is another example of the same:
http://answers.splunk.com/answers/24986/iis-log-fields-not-parsing

As for the duplication problem, I’ve not seen that. Having the timestamp of the file update is normal, and should not cause a re-read of the file. Splunk hashes the beginning of the file, so if that does not change then it should not be re-read. I’m guessing you have a setting in inputs.conf that is causing it. Can you post your inputs.conf?

Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...