Parsing XML data from fields

Kabobgub · ‎04-07-2015

Hello, after researching a lot of information I still can not recorgnise how to solve this problem.
I have an xml file added to splunk, and I've extracted fields through KV_MODE = xml.

          <result name="MISCONF_STATUS.SUCCESS"><![CDATA[154]]></result>
          <result name="MISCONF_RISK.HIGH"><![CDATA[39]]></result>
          <result name="MISCONF_ALL"><![CDATA[606]]></result>

So I have two fields here: result{@name} and result. the second is CDATA value. But the problem is they are not connected between eachother.
how to define that MISCONF_STATUS.SUCCESS = 154? And so on.
I tried to make a chart using this two fields, but it is not working at all.

iamtags · ‎08-19-2015

I was running into the same problem where I only needed a simple table merging a couple of xml values from many, and potentially multiple times per event.

To build off of what sideview ♦ explained, and from the mvexpand docs, I think I have a way to help you get just the fields you care about in a simple table. Notice first few lines are same as what was already posted

| rename "result{@name}" as result_name
| fields result_name result
| eval zipped=mvzip(result_name,result)
| mvexpand zipped

This is where the code changes a little bit to meet what I think you are requesting. You can actually just rex out of the new field you just created

| rex field=zipped "(?<result_name>\S+),(?<result>\d+)"
| table result_name result

Should be displayed like

result_name            result
MISCONF_STATUS.SUCCESS 154
MISCONF_RISK.HIGH      39
MISCONF_ALL            606

These results are then connected so you could get only specific events by appending

| where result_name="MISCONF_ALL" AND result="606"

For some visualizations you can also change

| table result_name result

to something like

| stats values(result_name) by result

Hope this helps

sideview · ‎04-07-2015

somesoni2's sed based approach may well be the best one, but here's some fun search language that can do the same.

I'm assuming that you have big multiline events that each have big multivalue values for your two fields "result{@name}" and "result"

| rename "result{@name}" as result_name
| eval zipped=mvzip(result_name,result)
| streamstats count as counter
| mvexpand zipped
| eval zipped = split(zipped,",")
| eval result = mvindex(zipped,0)
| eval {result}=mvindex(zipped,1)
| fields - zipped
| stats values(*) as * by counter
| fields - counter

It's a bit of a circus act but it'll work. eval's mvzip command can zip up two big multivalue values into a third multivalue field whose values look like "foo1,bar1" "foo2,bar2" etc. Then we kinda of take the results apart and put them back together again the way we need them.

Kabobgub · ‎04-07-2015

Thanks. It should work, but in this case I have a table with ALL my fields displayed. Could you tell me how can I use only this two fields?

sideview · ‎04-07-2015

It will work fine with other field values. They should be carried along throughout.

Kabobgub · ‎04-08-2015

The reason it is not suitable, that I have some junk fields in this case. All I need is to connect this two fields and have some visualisation of them. Thanks for your solution, but It differs a little from what I need. I will apreciate if you will give me some advice for my case

sideview · ‎04-08-2015

I'm afraid that I do not understand the problem you are trying to describe. Possibly because it is not a problem at all. can you describe why you think the other junk field values prevent this solution from giving you your visualization with this solution?

Kabobgub · ‎04-09-2015

Problem is that this is part of very wide system and this search generated too much data for the current visualization. I will really appreciate if you will tell me, how can I customize this search or what commands I need to use for my goals. For example if I need to see values of MISCONF_RISK.HIGH only or values of MISCONF_ALL fields or values exept MISCONF_STATUS.SUCCESS. I've tried some ways to do it but is too complicated for me.

sideview · ‎04-09-2015

If you just want these two fields, then you want to insert a fields command to explicitly filter out all other fields.

| rename "result{@name}" as result_name
 | fields result_name result
 | eval zipped=mvzip(result_name,result)
 | streamstats count as counter
 | mvexpand zipped
 | eval zipped = split(zipped,",")
 | eval result = mvindex(zipped,0)
 | eval {result}=mvindex(zipped,1)
 | fields - zipped
 | stats values(*) as * by counter
 | fields - counter

If you're getting an error that the search generated too much data for the visualization, that has more to do with the visualization you're trying to use. For instance if you try to generate a 1 year timechart with a 5 minute granularity you'll get errors like that in the UI.

somesoni2 · ‎04-07-2015

Try something like this

your base search with field _raw |  rex mode=sed "s/(\>\<\!\[CDATA\[)([^\]]+)(\]\])/ value=\2/g" | spath | rename result{@*} as * | eval {name}=value

Kabobgub · ‎04-07-2015

It seems to be right, but not working.

sideview · ‎04-07-2015

To clarify - the specific XML you posted ends up in a single event, and that event has two fields, both of which have big "multivalue" values of (MISCONF_STATUS.SUCCESS, MISCONF_RISK.HIGH, MISCONF_ALL), and 154,29,606. If you can confirm this then I think I can give you a search language answer.

Kabobgub · ‎04-07-2015

Almost. Actually it is situated between

<group > 
   <service>
        "this part"
   </service>
</group >

The rest is right.

markthompson · ‎04-07-2015

I would imagine you can use regex for this.... Should be able to generate a field based on a regular expression.

Parsing XML data from fields

Updated Team Landing Page in Splunk Observability

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...