Solved: Parse XML file with repeating events and indexed a...

i2sheri · ‎09-17-2015

I am indexing an XML file which has the structure below with many other tags in, before, and after <a> and <b>

I need -> | eval diffInSec= end-start | table status start end diffInSec for each c below and for all events.

I'm indexing the whole file as a single event, but I want each status as one record from all events. Is it possible?

<?xml?>
<a>
    <b><c><c>

        <c>
            <status status="P" start="DT" end="DT"></status>
        </c>
        <c>
            <status status="P" start="DT" end="DT"></status>
        </c>
        <c>
            <status status="P" start="DT" end="DT"></status>
        </c>
        <c>
            <status status="P" start="DT" end="DT"></status>
        </c>

    </c></c></b>
</a>

somesoni2 · ‎09-18-2015

If you've KV_MODE=xml in props.conf (on search head), then Splunk should automatically extract fields (mostly multivalued) from your xml. Search for your data in Verbose mode and you should see fields (on left field sidebar) may be like this

a.b.c.status{@status}
a.b.c.status(@start}
a.b.c.status(@end}

Once you get these fields, use mvzip/mvexpand/rex to get these as separate row and get your table. May be like this (sample, adjust based on your fields)

your base search | table a.b.c.status* | rename a.b.c.status{@*} as * | eval temp=mvzip(status,mvzip(start,end,"#"),"#") | table temp | mvexpand temp | rex field=temp "(?<status>.*)#(?<start>.*)#(?<end>.*)#" | field - temp

Above should give you a table with status, start and end. Then based on your timestamp format, your can calculate the different.

View solution in original post

somesoni2 · ‎09-18-2015

If you've KV_MODE=xml in props.conf (on search head), then Splunk should automatically extract fields (mostly multivalued) from your xml. Search for your data in Verbose mode and you should see fields (on left field sidebar) may be like this

a.b.c.status{@status}
a.b.c.status(@start}
a.b.c.status(@end}

Once you get these fields, use mvzip/mvexpand/rex to get these as separate row and get your table. May be like this (sample, adjust based on your fields)

your base search | table a.b.c.status* | rename a.b.c.status{@*} as * | eval temp=mvzip(status,mvzip(start,end,"#"),"#") | table temp | mvexpand temp | rex field=temp "(?<status>.*)#(?<start>.*)#(?<end>.*)#" | field - temp

Above should give you a table with status, start and end. Then based on your timestamp format, your can calculate the different.

i2sheri · ‎09-21-2015

which is better of the two below

host=h index=i sourcetype=xml
| eval start=strptime('a.b.c.status{@starttime}',"%Y%m%d %H:%M:%S.%3Q")
| eval end=strptime('a.b.c.status{@endtime}',"%Y%m%d %H:%M:%S.%3Q")
| eval temp=mvzip('a.b.c.status{@status}', mvzip(end, start, "#"), "#") 
| table temp | mvexpand temp | rex field=temp "(?<status>.*)#(?<end>.*)#(?<start>.*)" 
| eval diffInSec = end-start
| fields status diffInSec end start

OR

host=h index=i sourcetype=xml
| table a.b.c.status* | rename a.b.c.status{@*} as *
| eval temp=mvzip(status,mvzip(starttime,endtime,"#"),"#") 
| table temp | mvexpand temp | rex field=temp "(?<status>.*)#(?<starttime>.*)#(?<endtime>.*)"
| eval start=strptime(starttime,"%Y%m%d %H:%M:%S.%3Q")
| eval end=strptime(endtime,"%Y%m%d %H:%M:%S.%3Q")
| eval diffInSec=end-start | fields status diffInSec end start

i2sheri · ‎09-20-2015

rex does not work. pasted one of the temp entries below.

PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS#20150921 06:00:15.379 20150921 06:00:15.395 20150921 06:00:15.408 20150921 06:00:15.415 20150921 06:00:15.433 20150921 06:00:15.445 20150921 06:00:15.452 20150921 06:00:15.457 20150921 06:00:15.485 20150921 06:00:15.494 20150921 06:00:15.499 20150921 06:00:15.513 20150921 06:00:15.523 20150921 06:00:15.556 20150921 06:00:15.570#20150921 06:00:15.385 20150921 06:00:15.405 20150921 06:00:15.413 20150921 06:00:15.419 20150921 06:00:15.437 20150921 06:00:15.450 20150921 06:00:15.455 20150921 06:00:15.473 20150921 06:00:15.492 20150921 06:00:15.497 20150921 06:00:15.509 20150921 06:00:15.520 20150921 06:00:15.527 20150921 06:00:15.568 20150921 06:00:15.572

i2sheri · ‎09-21-2015

I removed |xmlkv from base search and it works with two changes

 your base search | table a.b.c.status* | rename a.b.c.status{@*} as * | eval temp=mvzip(status,mvzip(start,end,"#"),"#") | table temp | mvexpand temp | rex field=temp "(?<status>.*)#(?<start>.*)#(?<end>.*)" | fields - temp

richgalloway · ‎09-18-2015

It's probably not the best way, but you could do it with rex. That will give you three multi-valued fields. Combine them with mvzip then split the result into separate events and should have what you want.

... | rex max_match=0 "status=\"(?P<status>[^\"]*)\" start=\"(?P<start>[^\"]*)\" end=\"(?P<end>[^\"]*)\"" | eval fields = mvzip(mvzip(status,start),end) | mvexpand fields | ...

---
If this reply helps you, Karma would be appreciated.

Parse XML file with repeating events and indexed as one event

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk