Splunk Search

How to extract XML data from mixed content into one field for later use with spath?

roshannon
New Member

I have a mixed output log that contains XML and non-XML data. I am looking to extract the XML data into a field that I can later use spath on to get individual fields. My sample data is below. I am looking to get the entire <root>*<\root> into a single field that later I can use spath to get individual fields that I might want to search on. I have seen other recommendations to put XML into a single field for later spath usage, but did not see how to do that.

2015 May 22 15:23:44:024 GMT -0700 BW.DomainDMSEvents-DomainDMSEvents-P01 User [BW-User] - Job-10003 [UtilityProcesses/CreateAuditTrail.process/Log]: AuditTrail: 10003|Projects/DomainDMSEvents/ProcDefs/Starters/PublishDMSScanEvents.process||file|||2015-05-22T15:23:44.022-07:00|DomainDMSEvents-DomainDMSEvents-P01||||false||
|<root>
    <messageIn>
        <channel>file</channel>
        <msgID>1432333424013</msgID>
        <corlID>1432333424013</corlID>
        <raw><?xml version="1.0" encoding="UTF-8"?>
   <ns0:EventSourceOuputNoContentClass xmlns:ns0="http://www.tibco.com/namespaces/tnt/plugins/file"><action>remove</action><timeOccurred>1432333424013</timeOccurred><fileInfo><fullName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</fullName><fileName>DMSEvents.txt</fileName><location>/nfs/appdata/CTSE/OMS/DMS</location><configuredFileName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</configuredFileName><type>file</type><readProtected>true</readProtected><writeProtected>true</writeProtected><size>5651</size><lastModified>2015-05-20T12:07:28-07:00</lastModified></fileInfo></ns0:EventSourceOuputNoContentClass></raw>
            <EMSHeaderProperties>
                <header>
                    <name>fileNewName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/processed/DMSEvents.txt</value>
                </header>
                <header>
                    <name>fileName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</value>
                </header>
                <header>
                    <name>timestamp</name>
                    <value>1432333424017</value>
                </header>
            </EMSHeaderProperties>
            <parsed>
                <type>filePoller</type>
                <other/>
            </parsed>
        </messageIn>
        <messageOut>
            <name>DocImageEvent</name>
            <TXInfo>
                <tranType>DocImageEvent</tranType>
                <evtType>DocImageEvent</evtType>
                <topicOverride>Domain.CTS.CTSE.Canonical.S2C.DomainDMSEvents.DocImageEvent</topicOverride>
            </TXInfo>
        </messageOut>
        <psDef>
            <funcArea>S2C</funcArea>
            <appSource>DomainDMSEvents</appSource>
            <txIdentifier>DocImageEvent</txIdentifier>
            <startTS>1432333424017</startTS>
        </psDef>
    </root>|
0 Karma
1 Solution

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"

View solution in original post

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"
Get Updates on the Splunk Community!

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...