Splunk Search

How to extract multiple same name child attributes from XML data into their own unique fields?

bwheelock
Path Finder

I have some XML data broken down into events that have multiple child attributes that share the same name but are distinctly different fields. What I need to do, for each event, is to separate each unique child section into their own unique fields. This is probably extremely easy to accomplish, but I cannot seem to figure it out.

Referencing the sample data below, I need to extract the vendor's information as VendorName, VendorCity, etc and likewise for client and supplier I need ClientName, ClientCity, etc. Auto extraction obviously doesn't work in this case without transforms, and regular expressions are proving to be difficult because the real data has multiple addresses, phone numbers, and/or might lack some of this information. Each XML file (event) has about 100 lines.

I'm able to get the information easily if I use spath, but I don't think I can use spath for anything but search time extraction by piping it in with the output and path. That's fine for a search or report here or there, but otherwise I'm piping in about 50 lines.

What am I doing wrong?

<OrderForm>
  <ClientOrder PO="00000123">
    <Vendor ID="789">
      <Name>Paperclips, INC</Name>
      <Address>
        <Street>789 Paper St</Street>
        <City>San Francisco</City>
        <State>CA</State>
        <Zip>84989</Zip>
      </Address>
    </Vendor>
    <Supplier ID="224">
      <Name>Happy Paper Co.</Name>
      <Address>
        <Street>12455 Shipping Ave</Street>
        <City>Los Angeles</City>
        <State>CA</State>
        <Zip>92254</Zip>
      </Address>
    </Supplier>
    <Client ID="4152">
      <Name>Dunder Mifflin Infinity</Name>
      <Address>
        <Street>1725 Slough Ave</Street>
        <City>Scranton</City>
        <State>PA</State>
        <Zip>18503</Zip>
      </Address>
    </Client>
  </ClientOrder>
</OrderForm>
0 Karma

anhtran
New Member

How can you use Spath to get the VendorName, VendorCity, etc ? Thanks

0 Karma

bwheelock
Path Finder
index=test_orders sourcetype=orderForms
| spath output=VendorName path=OrderForm.ClientOrder.Vendor.Name
| spath output=VendorCity path=OrderForm.ClientOrder.Vendor.Address.City
| spath output=ClientName path=OrderForm.ClientOrder.Client.Name
| spath output=ClientCity path=OrderForm.ClientOrder.Client.Address.City

anhtran
New Member

ah, very neat , thank you !

0 Karma

miteshvohra
Contributor

If every XML file is a single event, you may try this props settings:

LINE_BREAKER = (?!)
SHOULD_LINEMERGE = false
#BREAK_ONLY_BEFORE = <OrderForm>
DATETIME_CONFIG = NONE
LEARN_MODEL = false
#MAX_EVENTS = 200000
TRUNCATE = 0 

Let us know what worked for you.

Mitesh.

0 Karma

bjoernjensen
Contributor

Hi,

maybe splitting the XML at the level of "Vendor", "Supplier", "Client" could help. Therefore use the BREAK_ONLY_BEFORE in your props.conf:
http://docs.splunk.com/Documentation/Splunk/6.0.2/Admin/propsconf

with something like this regex:
BREAK_ONLY_BEFORE = ^\s+\<Vendor ID="\d+"\>|^\s+\<Supplier ID="\d+"\>|^\s+\<Client ID="\d+"\>

You may still have to do "something" with the parts between two <OrderForm> Elements (e.g. NULL queue).

0 Karma

bwheelock
Path Finder

I'd lose the corresponding order ID though, in this case the PO#.

0 Karma

anhtran
New Member

Is this solution worked for you ? Thank you

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...