I have an xml file with the following format:
<?xml version="1.0">
<ChangeHistory>
<Left_2015_01_01_9_45_00___Right_2015_02_02_9_50_00>
<field A status="modified value=123">
<field B xxxxxxx>
<value 1>
<value 2>
<field C xxxx>
</field C>
</field B>
</field A>
</Left_2015_01_01_9_45_00___Right_2015_02_02_9_50_00>
<Left_2015_02_02_9_50_00___Right_2015_03_03_9_50_00>
.
.
.
</Left_2015_02_02_9_50_00___Right_2015_03_03_9_50_00>
</ChangeHistory>
Every time my batch job runs, I have a new entry of
<Left_previous_run_time___Right_curent_run_time> ... </Left_previous_run_time___Right_curent_run_time>
When I monitor a directory with such a file, every time a new entry is added, since the addition of the event is within the "ChangeHistory" tags and not at the end of the file, the entire gets re indexed. Is there any way to add just the new event to my index?
I have come across the LINE_BREAKER setting. But my understanding is that this just breaks the xml file into different events, which I am able to do even with BREAK_ONLY_BEFORE setting. How do I prevent re-indexing of the entire file in this case?
Nope. Splunk expects new data to arrive at the end of the file. It is not possible for Splunk to identify changed lines in the middle of the file. And if it did, how would Splunk know if any contextual data needed to be captured along with the changes?
One solution is to write a script that captures each version of the file and identifies the changes. The script could write only the changes to the file, and the Splunk could be set to index the "changes" file.
Nope. Splunk expects new data to arrive at the end of the file. It is not possible for Splunk to identify changed lines in the middle of the file. And if it did, how would Splunk know if any contextual data needed to be captured along with the changes?
One solution is to write a script that captures each version of the file and identifies the changes. The script could write only the changes to the file, and the Splunk could be set to index the "changes" file.
How do I do that? Do you have an example?
Thanks for your response.
If this is the case, am I right to assume its not possible to monitor xml files which are continuously being written to?
No, many people monitor xml files with Splunk.
What is not possible is to monitor "delta" changes between files with Splunk - whether the file is xml or anything else. Unless you write a script as described above.
Splunk expects new information to be written to the end of the file, regardless of the file format.
Hi Iguinn,
I see this post was from a year ago. I just wanted to confirm to your knowledge has there been any headway on an easier way (than writing a delta changing script) to index an XML event without it re-indexing the entire log? This would seem to be a common task that Splunk might have added some functionality for in the past year?
Thanks!
AlexW
Got it, thanks again