Dashboards & Visualizations

xml file re-indexed when new event is added

nivedita_viswan
Path Finder

I have an xml file with the following format:

<?xml version="1.0">
<ChangeHistory>
   <Left_2015_01_01_9_45_00___Right_2015_02_02_9_50_00>
     <field A status="modified value=123">
       <field B xxxxxxx>
           <value 1>
           <value 2>
        <field C xxxx>
        </field C>
       </field B>
     </field A>
   </Left_2015_01_01_9_45_00___Right_2015_02_02_9_50_00>
   <Left_2015_02_02_9_50_00___Right_2015_03_03_9_50_00>
          .
          .
          .
   </Left_2015_02_02_9_50_00___Right_2015_03_03_9_50_00>
</ChangeHistory>

Every time my batch job runs, I have a new entry of

 <Left_previous_run_time___Right_curent_run_time> ...  </Left_previous_run_time___Right_curent_run_time>

When I monitor a directory with such a file, every time a new entry is added, since the addition of the event is within the "ChangeHistory" tags and not at the end of the file, the entire gets re indexed. Is there any way to add just the new event to my index?

I have come across the LINE_BREAKER setting. But my understanding is that this just breaks the xml file into different events, which I am able to do even with BREAK_ONLY_BEFORE setting. How do I prevent re-indexing of the entire file in this case?

Tags (3)
0 Karma
1 Solution

lguinn2
Legend

Nope. Splunk expects new data to arrive at the end of the file. It is not possible for Splunk to identify changed lines in the middle of the file. And if it did, how would Splunk know if any contextual data needed to be captured along with the changes?

One solution is to write a script that captures each version of the file and identifies the changes. The script could write only the changes to the file, and the Splunk could be set to index the "changes" file.

View solution in original post

0 Karma

lguinn2
Legend

Nope. Splunk expects new data to arrive at the end of the file. It is not possible for Splunk to identify changed lines in the middle of the file. And if it did, how would Splunk know if any contextual data needed to be captured along with the changes?

One solution is to write a script that captures each version of the file and identifies the changes. The script could write only the changes to the file, and the Splunk could be set to index the "changes" file.

0 Karma

rmanrique
Path Finder

How do I do that? Do you have an example?

0 Karma

nivedita_viswan
Path Finder

Thanks for your response.
If this is the case, am I right to assume its not possible to monitor xml files which are continuously being written to?

0 Karma

lguinn2
Legend

No, many people monitor xml files with Splunk.

What is not possible is to monitor "delta" changes between files with Splunk - whether the file is xml or anything else. Unless you write a script as described above.

Splunk expects new information to be written to the end of the file, regardless of the file format.

0 Karma

alexsambacanada
Engager

Hi Iguinn,

I see this post was from a year ago. I just wanted to confirm to your knowledge has there been any headway on an easier way (than writing a delta changing script) to index an XML event without it re-indexing the entire log? This would seem to be a common task that Splunk might have added some functionality for in the past year?

Thanks!

AlexW

0 Karma

nivedita_viswan
Path Finder

Got it, thanks again

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...