Splunk Search

How to extract multivalue fields from XML data at search-time?

andra_pietraru
Path Finder

Hi all,

I indexed a XML file and I am trying to extract some fields at search-time.

What I'm trying to do is extract two fields, but I am failing because it is repeating. Right now, Splunk takes only my first match. Any help?? Thanks!

1 Solution

Gilberto_Castil
Splunk Employee
Splunk Employee

You do not need to use a regular expression to obtain the data. Splunk is already instrumented with an automated extraction feature that will look into the XML and assign the value between the tags. A multi-value is normally handled. Let's ellaborate:

Assume that you have a data point as follows:

<switch>
    <managed>true</managed>
       <ipv4address>111.111.111.11</ipv4address>
       <powereddevice>
         <macaddress>000000000001</macaddress>
         <port>11</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000002</macaddress>
         <port>9</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000003</macaddress>
         <port>7</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000004</macaddress>
         <port>5</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000005</macaddress>
         <port>3</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000006</macaddress>
         <port>1</port>
       </powereddevice>
</switch>

Here is a generic inputs.conf configuration:

[monitor:///media/answers/04-15-1015/1/]
disabled = false
index = test
sourcetype = answers-1429106201

This is the companion props.conf. You need to call this explicitly.

[answers-1429106201]
KV_MODE = xml

The end result is something like this:

alt text

This KV_MODE property is a search-time change so you do not need to restart your Splunk instance. You will need to perform that change directly on the Search Head of your deployment.

I hope this helps you,

--
gc

View solution in original post

Gilberto_Castil
Splunk Employee
Splunk Employee

You do not need to use a regular expression to obtain the data. Splunk is already instrumented with an automated extraction feature that will look into the XML and assign the value between the tags. A multi-value is normally handled. Let's ellaborate:

Assume that you have a data point as follows:

<switch>
    <managed>true</managed>
       <ipv4address>111.111.111.11</ipv4address>
       <powereddevice>
         <macaddress>000000000001</macaddress>
         <port>11</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000002</macaddress>
         <port>9</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000003</macaddress>
         <port>7</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000004</macaddress>
         <port>5</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000005</macaddress>
         <port>3</port>
       </powereddevice>
       <powereddevice>
         <macaddress>000000000006</macaddress>
         <port>1</port>
       </powereddevice>
</switch>

Here is a generic inputs.conf configuration:

[monitor:///media/answers/04-15-1015/1/]
disabled = false
index = test
sourcetype = answers-1429106201

This is the companion props.conf. You need to call this explicitly.

[answers-1429106201]
KV_MODE = xml

The end result is something like this:

alt text

This KV_MODE property is a search-time change so you do not need to restart your Splunk instance. You will need to perform that change directly on the Search Head of your deployment.

I hope this helps you,

--
gc

andra_pietraru
Path Finder

Can you give me a hint on how I could retrieve the corresponding port for a specific macaddress?
Right now, when I do the following query, I get all the ports:

sourcetype=xmlConfig switch.powereddevice.macaddress=000000000001 |rename switch.powereddevice.port AS port | table port

So, instead of getting only port "11", I get all "11,9,7,5,3,1" ports.

0 Karma

aweitzman
Motivator

In order to get the corresponding port to a particular MAC address, you need to associate them first using mvzip:

... | eval MACAndPort = mvzip(switch.powereddevice.macaddress,switch.powereddevice.port) | mvexpand MACAndPort | rex field=MACAndPort "(?<MAC>\d+),(?<Port>\d+)" | ...

andra_pietraru
Path Finder

*Update: solved this error: I tried, but I get error "Error in 'eval' command: The arguments to the 'mvzip' function are invalid." Should I add something to the mvzip function?

**Update 2: I also found an answer for this question: But now my question is how could I retrieve only the port, based on a mac address?

Thank you for your help!

0 Karma

aweitzman
Motivator

Glad you figured out mvzip.

Anyways, if you know what your MAC address is, just append | where MAC=themacaddress to the end of your search string.

andra_pietraru
Path Finder

It did help me. Thanks!

0 Karma

fdi01
Motivator

If you check out the doc on the rex command you'll see that max_match= Controls the number of times the regex is matched. If greater than 1, the resulting fields will be multivalued fields. Defaults to 1, use 0 to mean unlimited.
try like this:

...| rex max_match=0    your_regex...
or 
...| rex max_match=0  field=_raw  your_regex...

see this link to understand fine:
http://answers.splunk.com/answers/227331/using-rex-to-extract-multivalue-fields-from-events.html

aweitzman
Motivator

Can you please show us the command you're using? For instance, when I do this:

| gentimes start=-1 
| eval xml="[your XML above]" 
| rename xml as _raw 
| spath 
| table switch.powereddevice.macaddress switch.powereddevice.port

I get two multivalued fields containing all the values.

joxley
Path Finder

+1 for | rename xml as _raw

0 Karma

andra_pietraru
Path Finder

I didn't use a command. I uploaded the XML file and indexed it and now i'm trying to extract fields, because I will later need to use them for another search command.
Just to explain better: I have other events (from a different source) and they have the IP addresses as fields. What I want is: based on an IP address, search this XML file and get back the port and the macaddress

0 Karma

andra_pietraru
Path Finder

Until now I was trying to write a command using xpath, but was not working. I see that spath is much more helpful. Thanks!

0 Karma

aweitzman
Motivator

Glad to hear it. Using that, you should be able to structure your search something like this:

source=myxmlfiles 
| spath 
| search switch.ipv4address="111.111.111.11"
| rename switch.ipv4address AS IPv4 switch.powereddevice.macaddress AS MACAddresses switch.powereddevice.port AS Ports
| table IPv4 MACAddresses Ports
0 Karma

andra_pietraru
Path Finder

How can I find all events that have that MACaddress as a value in their field?
What I want to achieve:
In a subsearch, based on ipv4address, I want to retrieve all MAC addresses. Then, in my main search, I want to find all events that have that MACaddress as a value for the ID field and make a timechart.
This is what I have so far:
sourcetype=perfdata [search sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | return ID] | timechart span=5m values(field1)
But I'm doing something wrong, because I get no results back.
Do you have any suggestion?
Thanks!

0 Karma

aweitzman
Motivator

Use fields ID or table ID rather than return ID in your subsearch.

0 Karma

andra_pietraru
Path Finder

I tried. Still not getting any results back.

0 Karma

aweitzman
Motivator

Hm. What happens when you run the following search:

sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | table ID | format

By adding | format to the end of it, you can see what the text of the subsearch will be without it running. Is it what you are expecting?

I just realized that ID is a multivalued field, and you might have to expand it. See if this closer to what you are expecting:

sourcetype=xmlConfig switch.ipv4address=111.111.111.11 | rename switch.powereddevice.macaddress AS ID | mvexpand ID | table ID | format

0 Karma

andra_pietraru
Path Finder

I tried adding mvexpand and now it returns all the IDs that I was expecting. Thanks so much!

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...