Getting Data In

Is it possible to tell Splunk not to index the latest log file?

Dark_Ichigo
Builder

I want to configure my indexer to not index the latest still populating log file in a directory, what the best way of doing that, if someone can point me in the right direction.

In relation to Kristian's update message below:

a) you have a directory that you monitor (not specific files in it)?
Yes, a directory that contains logs for the same application.

b) you want to index the files in the directory, but not while the application/process is still updating that logfile?
Yes, exactly what I'm trying to do!

c) once the application/process starts on a new file, you want to index the one that was just closed
Yes, I want to index all files that are closed and are no longer being written to.

d) my current file is called RE-11092012-11:15:31.log

e) my finished files are called RE-11092012-11:15:31.log --- i.e. Unchanged when closed

f) my [monitor] stanza looks like

[monitor:///opt/splunk/var/RE]
sourcetype = re
index = re
whitelist = RE_\d{8}_\d\d:\d\d:\d\d.log
recursive = false

g) My log is from system/application a Custom made application built from the ground up.

0 Karma

Dark_Ichigo
Builder

What about MAX_DAYS_AGO or MAX_DAYS_HENCE in props.conf, if I set MAX_DAYS_AGO=0 for example will that ignore the current day?

0 Karma

kristian_kolb
Ultra Champion

I'm sorry, but I am not aware of any method by which you can instruct Splunk to wait until you are done with the file.

If it had had a different name when finished (e.g. blah.log.1) you could have used blacklist/whitelist methods.

My best suggestion is to tell splunk to monitor a different directory (e.g. /var/log/RE-finished/) and have a cron job that moves files from the 'real' directory. If there is is more than one file in the 'real' directory, move the oldest ones...

/k

0 Karma

kristian_kolb
Ultra Champion

You are monitoring a directory, and you want to index the file once the logging process is done writing to it, right? Then you'd really want to use ignoreNewerThan = 1d

Unfortunately, that configuration parameter does not exist to the best of my knowledge.

So perhaps you can write your log to a different location/name, and have the logrotate-script (or similar) move/rename the file to a location where Splunk will pick it up.


UPDATE:

Just to make sure, please answer the following as an update to the original question:

a) you have a directory that you monitor (not specific files in it)?

b) you want to index the files in the directory, but not while the application/process is still updating that logfile?

c) once the application/process starts on a new file, you want to index the one that was just closed

d) my current file is called ....... (please fill in)

e) my finished files are called .......... (please fill in)

f) my [monitor] stanza looks like ......... (please fill in)

g) my log is from system/application .......... (please fill in)

Hope this helps,

Kristian

0 Karma

Dark_Ichigo
Builder

I have updated my question.

0 Karma

kristian_kolb
Ultra Champion

see update above /k

0 Karma

Dark_Ichigo
Builder

Thank you for your answer Kristian, but the main reason for posting this question was to avoid making any modifications to the directory where the logs are, I have no choice but to find a solution where the current "latest" file will not be indexed or even read at all if possible, if only there was a regex where I could compare the current time with the time of the timestamp of the logfile, would that be a possibility?

0 Karma

MuS
Legend

hi dark_ichigo

maybe you can tweak something up with white- or blacklisting your most recent log file

cheers,

MuS

0 Karma

Dark_Ichigo
Builder

So NullQueue will actually open and read the file, but wont index it at all, so in this case it will reduce some of the load as it wont index it, so as a performance question, will splunk read the file only once and send it to NullQueue?, or will it keep trying?

0 Karma

kristian_kolb
Ultra Champion

Well, sending stuff to the nullQueue will just prevent the events from indexing. From a splunk perspective, the file has been read and discarded, and the events therein would not be re-read by Splunk when the file is closed for writing.

ignoreOlderThan will look at the timestamp of the file, and ignore files that are older than the specified value. I believe that is quite the opposite of what you want.

/k

0 Karma

MuS
Legend

ignoreOlderThan would only be useful on older files, but not the current. NullQueue can be the one you are looking for

0 Karma

Dark_Ichigo
Builder

Can I use NullQueue to get rid of unwanted events, by writing a regex to compare the timestamp of the event of the file to the current time and then sends it to NullQueue to prevent it from indexing?

0 Karma

Dark_Ichigo
Builder

Hey mate, would the ignoreOlderThan be what Im looking for?

0 Karma

Dark_Ichigo
Builder

If my question doesn't make any sense, please let me know.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...