I have a mixed output log that contains XML and non-XML data. I am looking to extract the XML data into a field that I can later use spath on to get individual fields. My sample data is below. I am looking to get the entire <root>*<\root> into a single field that later I can use spath to get individual fields that I might want to search on. I have seen other recommendations to put XML into a single field for later spath usage, but did not see how to do that.
2015 May 22 15:23:44:024 GMT -0700 BW.DomainDMSEvents-DomainDMSEvents-P01 User [BW-User] - Job-10003 [UtilityProcesses/CreateAuditTrail.process/Log]: AuditTrail: 10003|Projects/DomainDMSEvents/ProcDefs/Starters/PublishDMSScanEvents.process||file|||2015-05-22T15:23:44.022-07:00|DomainDMSEvents-DomainDMSEvents-P01||||false||
|<root>
<messageIn>
<channel>file</channel>
<msgID>1432333424013</msgID>
<corlID>1432333424013</corlID>
<raw><?xml version="1.0" encoding="UTF-8"?>
<ns0:EventSourceOuputNoContentClass xmlns:ns0="http://www.tibco.com/namespaces/tnt/plugins/file"><action>remove</action><timeOccurred>1432333424013</timeOccurred><fileInfo><fullName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</fullName><fileName>DMSEvents.txt</fileName><location>/nfs/appdata/CTSE/OMS/DMS</location><configuredFileName>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</configuredFileName><type>file</type><readProtected>true</readProtected><writeProtected>true</writeProtected><size>5651</size><lastModified>2015-05-20T12:07:28-07:00</lastModified></fileInfo></ns0:EventSourceOuputNoContentClass></raw>
<EMSHeaderProperties>
<header>
<name>fileNewName</name>
<value>/nfs/appdata/CTSE/OMS/DMS/processed/DMSEvents.txt</value>
</header>
<header>
<name>fileName</name>
<value>/nfs/appdata/CTSE/OMS/DMS/DMSEvents.txt</value>
</header>
<header>
<name>timestamp</name>
<value>1432333424017</value>
</header>
</EMSHeaderProperties>
<parsed>
<type>filePoller</type>
<other/>
</parsed>
</messageIn>
<messageOut>
<name>DocImageEvent</name>
<TXInfo>
<tranType>DocImageEvent</tranType>
<evtType>DocImageEvent</evtType>
<topicOverride>Domain.CTS.CTSE.Canonical.S2C.DomainDMSEvents.DocImageEvent</topicOverride>
</TXInfo>
</messageOut>
<psDef>
<funcArea>S2C</funcArea>
<appSource>DomainDMSEvents</appSource>
<txIdentifier>DocImageEvent</txIdentifier>
<startTS>1432333424017</startTS>
</psDef>
</root>|
Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field
... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"
Not sure how consistent that log format is, but something like this seems to work for me in a limited test env. I'm just using rex to grab the "*" portion of the event and throw it in a field called xml_field
... | rex "(?<xml_field>\<root\>[\s\S]+\<\/root\>)"