I need to monitor an application logfile, and have a problem with the default way Splunk "tails" a file. This particular log file doesn't append new rows to the file, it inserts them.
< root >
< a >
< b >
< c >
< /root >
becomes
"< root >
< a >
< b >
< c >
< d >
< /root >"
This causes Splunk to consider it a completely new file, and it reindexes the whole thing, when I only want < d >. Has anyone come across this before and solved it? If so, how?
AFAIK, you can't solve this by using Splunk input settings. In Splunk, you have only 2 choices: re-index the whole file each time it changes, or index new data that is appended to the end of the file. There is no way to pick up a new event that appears in the middle of the file.
I think the easiest way to handle this is to write a script that compares the last version of the file with the current version and then outputs the new events to stdout. Use this as a scripted input.
Noted for posterity and the search spiders: I have this problem with the XML log files generated by Ipswitch WS_FTP.
The problem presents as a huge number of duplicate events collected, counted against license, and shown in search (millions, in my case) , whereas inspecting the actual log files reveals only only hundreds or thousands of events. Piping the events through "... | dedup " reveals the actual number of unique events.
Confirm the issue by running a search which shows the growing time between the events as-written and the time they were (re)indexed:
... | eval delta=(_time - _indextime) | timechart avg(delta) span=15m
AFAIK, you can't solve this by using Splunk input settings. In Splunk, you have only 2 choices: re-index the whole file each time it changes, or index new data that is appended to the end of the file. There is no way to pick up a new event that appears in the middle of the file.
I think the easiest way to handle this is to write a script that compares the last version of the file with the current version and then outputs the new events to stdout. Use this as a scripted input.
I've solved similar situation by not "logging" that way. The idea of a logfile is to add stuff to the end, not to the middle. Splunk checks the start and former end of a file up to where it left off reading for changes, and if those checks fail the file is (correctly) presumed to be altered instead of appended.
One way would be to log self-contained xml documents with their own root element if you absolutely have to log xml.
Another way would be to log using key-value lines rather than xml nodes inserted in the middle of an xml tree.