I have some XML data broken down into events that have multiple child attributes that share the same name but are distinctly different fields. What I need to do, for each event, is to separate each unique child section into their own unique fields. This is probably extremely easy to accomplish, but I cannot seem to figure it out.
Referencing the sample data below, I need to extract the vendor's information as VendorName, VendorCity, etc and likewise for client and supplier I need ClientName, ClientCity, etc. Auto extraction obviously doesn't work in this case without transforms, and regular expressions are proving to be difficult because the real data has multiple addresses, phone numbers, and/or might lack some of this information. Each XML file (event) has about 100 lines.
I'm able to get the information easily if I use spath, but I don't think I can use spath for anything but search time extraction by piping it in with the output and path. That's fine for a search or report here or there, but otherwise I'm piping in about 50 lines.
What am I doing wrong?
<OrderForm>
<ClientOrder PO="00000123">
<Vendor ID="789">
<Name>Paperclips, INC</Name>
<Address>
<Street>789 Paper St</Street>
<City>San Francisco</City>
<State>CA</State>
<Zip>84989</Zip>
</Address>
</Vendor>
<Supplier ID="224">
<Name>Happy Paper Co.</Name>
<Address>
<Street>12455 Shipping Ave</Street>
<City>Los Angeles</City>
<State>CA</State>
<Zip>92254</Zip>
</Address>
</Supplier>
<Client ID="4152">
<Name>Dunder Mifflin Infinity</Name>
<Address>
<Street>1725 Slough Ave</Street>
<City>Scranton</City>
<State>PA</State>
<Zip>18503</Zip>
</Address>
</Client>
</ClientOrder>
</OrderForm>
How can you use Spath to get the VendorName, VendorCity, etc ? Thanks
index=test_orders sourcetype=orderForms
| spath output=VendorName path=OrderForm.ClientOrder.Vendor.Name
| spath output=VendorCity path=OrderForm.ClientOrder.Vendor.Address.City
| spath output=ClientName path=OrderForm.ClientOrder.Client.Name
| spath output=ClientCity path=OrderForm.ClientOrder.Client.Address.City
ah, very neat , thank you !
If every XML file is a single event, you may try this props settings:
LINE_BREAKER = (?!)
SHOULD_LINEMERGE = false
#BREAK_ONLY_BEFORE = <OrderForm>
DATETIME_CONFIG = NONE
LEARN_MODEL = false
#MAX_EVENTS = 200000
TRUNCATE = 0
Let us know what worked for you.
Mitesh.
Hi,
maybe splitting the XML at the level of "Vendor", "Supplier", "Client" could help. Therefore use the BREAK_ONLY_BEFORE
in your props.conf:
http://docs.splunk.com/Documentation/Splunk/6.0.2/Admin/propsconf
with something like this regex:
BREAK_ONLY_BEFORE = ^\s+\<Vendor ID="\d+"\>|^\s+\<Supplier ID="\d+"\>|^\s+\<Client ID="\d+"\>
You may still have to do "something" with the parts between two <OrderForm>
Elements (e.g. NULL queue).
I'd lose the corresponding order ID though, in this case the PO#.
Is this solution worked for you ? Thank you