Splunk Search

How to extract XML data from mixed content into one field for later use with spath?

roshannon
New Member

I have a mixed output log that contains XML and non-XML data. I am looking to extract the XML data into a field that I can later use spath on to get individual fields. My sample data is below. I am looking to get the entire <root>*<\root> into a single field that later I can use spath to get individual fields that I might want to search on. I have seen other recommendations to put XML into a single field for later spath usage, but did not see how to do that.

2015 May 22 15:23:44:024 GMT -0700 BW.DomainDMSEvents-DomainDMSEvents-P01 User [BW-User] - Job-10003 [UtilityProcesses/CreateAuditTrail.process/Log]: AuditTrail: 10003|Projects/DomainDMSEvents/ProcDefs/Starters/PublishDMSScanEvents.process||file|||2015-05-22T15:23:44.022-07:00|DomainDMSEvents-DomainDMSEvents-P01||||false||
|<root>
    <messageIn>
        <channel>file</channel>
        <msgID>1432333424013</msgID>
        <corlID>1432333424013</corlID>
        <raw><?xml version="1.0" encoding="UTF-8"?>
   <ns0:EventSourceOuputNoContentClass xmlns:ns0="http://www.tibco.com/namespaces/tnt/plugins/file"><action>remove</action><timeOccurred>1432333424013</timeOccurred><fileInfo><fullName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</fullName><fileName>DMSEvents.txt</fileName><location>/nfs/appdata/CTSE/OMS/DMS</location><configuredFileName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</configuredFileName><type>file</type><readProtected>true</readProtected><writeProtected>true</writeProtected><size>5651</size><lastModified>2015-05-20T12:07:28-07:00</lastModified></fileInfo></ns0:EventSourceOuputNoContentClass></raw>
            <EMSHeaderProperties>
                <header>
                    <name>fileNewName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/processed/DMSEvents.txt</value>
                </header>
                <header>
                    <name>fileName</name>
                    <value>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</value>
                </header>
                <header>
                    <name>timestamp</name>
                    <value>1432333424017</value>
                </header>
            </EMSHeaderProperties>
            <parsed>
                <type>filePoller</type>
                <other/>
            </parsed>
        </messageIn>
        <messageOut>
            <name>DocImageEvent</name>
            <TXInfo>
                <tranType>DocImageEvent</tranType>
                <evtType>DocImageEvent</evtType>
                <topicOverride>Domain.CTS.CTSE.Canonical.S2C.DomainDMSEvents.DocImageEvent</topicOverride>
            </TXInfo>
        </messageOut>
        <psDef>
            <funcArea>S2C</funcArea>
            <appSource>DomainDMSEvents</appSource>
            <txIdentifier>DocImageEvent</txIdentifier>
            <startTS>1432333424017</startTS>
        </psDef>
    </root>|
0 Karma
1 Solution

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"

View solution in original post

maciep
Champion

Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field

... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...