Currently, I have a forwarder monitoring a directory of files that are being logged in real time. My indexer is receiving all of the latest info from the forwarder as expected.
I now have the following requirement requested of me:
My thoughts are to setup the followTail = True option and just turn off the forwarder when I don't need to log real time events (e.g. between the hours of 6:01a and 7:59p)
Does anyone else have a better idea?
The ignoreOlderThan
and followTail
options are definitely interesting and might work. But it sounds like the most straightforward approach is to have the originating system rotate logs at 20:00 and 06:00. Or, even hourly, if the system is producing enough logs to justify it. (And hourly might be easier to configure in something like log4j). And then use blacklist
and whitelist
, both of which are well-known and surprise free. The other options are complicated enough to worry me about long-term reliability.
As Kristian mentioned, if you could nullQueue
this data that would be the most ideal approach and wouldn't require application changes. From the docs on transforms.conf
, _time
is a valid field to use as a SOURCE_KEY
. So, in theory, you could precompute a series of regular expressions expressing periods of 06:01 - 19:59 in time_t
format for future dates. Such regexes would probably be nontrivial and would need to be maintained for the life of the system to add in in new time_t
values. I wouldn't suggest trying this at home.
If you can't get the providers of the logfiles to do rotation to help you, then I would suggest filing an ER to ask for something like a _time_of_day
key (In the format of HH:MM:SS.ssssss or similar) that would be usable in transforms.conf
for the purpose of sending data to the nullQueue
.
According to the docs for transforms.conf
, date_hour
is not a supported field for SOURCE_KEY
. So, I'm quite confident it is computed too late. Agreed that dealing with epoch time would be insanely difficult.
Exactly my point regarding the regex for _time - I haven't had time/reason to figure out if date_hour (which is derived from _time) is computed at the parsing stage, or rather if it's computed before the nullQueue routing would take place. Dealing directly with epoch time is more likely than not going to give headaches in the long run.
/k
Perhaps a better option than completely turning off the forwarder would be to simply disable that input. The assumption here is that you may need the forwarder to monitor other files.
I normally pack an app.conf
and an inputs.conf
in an (rather conveniently called) input app; both files reside under $SPLUNK_HOME/etc/apps/my_input_app/local
. The inputs.conf
contains the monitor stanza that points to where your files reside and other options including followTail=1
; the app.conf contains the following:
[install]
state = enabled
I would then have a cron job that runs according to your schedule and does the following:
state=disabled
EDIT_1: As Kristian points out the followTail=1
only applies to files the first time they are picked up. After that, Splunk's internal file position records keep track of the file. This means that the fishbucket files will tell Splunk where it left off an it will pick up the old, unnecessary data as well as real time ones. As i remark below, I would try playing with ignoreOlderThan
setting (using seconds for better resolution ).
Hope this helps.
> please upvote and accept answer if you find it useful - thanks!
I UPDATED my original answer, since you may be on to something here.
No, you're right - the fishbucket won't be purged. He can, though, try to play with ignoreOlderThan setting (in minutes or seconds for better resolution ). But, yes, it is not a trivial and requires a lot of testing.
Does the enabling/disabling of an app/monitor stanza actually clear the fishbucket for the inputs involved, i.e. wont the forwarder pick up where it left off?
/k
Hmm, I'm not 100% sure that followTail=1 will be honoured in the way one may think. The following is from the docs for inputs.conf;
followTail = [0|1]
* Determines whether to start monitoring at the beginning of a file or at the end (and then index all events
that come in after that).
* If set to 1, monitoring begins at the end of the file (like tail -f).
* If set to 0, Splunk will always start at the beginning of the file.
* This only applies to files the first time Splunk sees them. After that, Splunk's internal file position
records keep track of the file.
* Defaults to 0.
You have a few options I guess, some of which may not be feasible;
a) prevent the log files from being written to during the daytime (6am-8pm). Or possibly write to a daytime directory which is not being monitored. Not very neat solution.
b) stop the forwarder as you suggested, and delete (parts of) the fishbucket, which should give your forwarder a convenient case of amnesia, thus allowing for the followTail=1 to work again. Depending on your setup, i.e. what else is being monitored by the forwarder, this is perhaps not so easy a/o may produce strange results. Then again, it may work just fine.
c) Route all events originating during the day to the nullQueue, so they do not get indexed. You would have to craft a regex to match event timestamps for 6am-8pm, but I'm not sure what fields are available to you at this part of the process. Would probably be the neatest way of doing it, but I haven't tried anything similar, so it may not work at all.
UPDATE:
d) as _d_
pointed out, you could work with ignoreOlderThan
to control which files will be read by the monitor. The option here would then be to
i) ensure all logs are rotated at 7.59PM
ii) use ignoreOlderThan=1m
for the directory monitor stanza
iii) start the forwarder through cron or whatever at 8.01PM
iv) stop the forwarder through cron or whatever at 6.00AM
this ensures that the events from between 6AM-8PM will not get indexed, since ignoreOlderThan
goes by the modtime of the file.
Hope this helps, or at least serves as inspiration to somebody more knowledgeable than me to work out the exact steps to take.
regards,
kristian