Splunk Search

How to extract multiple values from XML logs and display all events where FieldA is not equal to FieldB?

anilkamath
Engager

I have some XML responses logged in Splunk which is pretty nested. Let's say there are multiple records of the form.

<records>
      <record>
        <Full Name>Ms. Brown Grimes</Full Name>
        <Country>Dronning Maud Land</Country>
        <NotificationEmail>Sam.Lemke@mckenzie.info</NotificationEmail>
        <Created At>Fri Aug 25 1989 22:17:00 GMT-0700 (Pacific Daylight Time)</Created At>
        <Id>10</Id>
        <Email>Sam.Lemke@mckenzie.info</Email>
      </record>
      <record>
        <Full Name>Irma Ledner I</Full Name>
        <Country>Vatican City</Country>
        <NotificationEmail>GabrielleGmail@gmail.com</NotificationEmail>
        <Created At>Tue Nov 30 1993 08:16:58 GMT-0800 (Pacific Standard Time)</Created At>
        <Id>12</Id>
        <Email>Gabrielle@myrl.biz</Email>
      </record>
    </records>

Now I want to find all records where NotificationEmail is not equal to Email.

What I was trying was piping to regex extractor.

rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"

where \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b is the regex to match email.

mIliofotou_splu
Splunk Employee
Splunk Employee

You can let Splunk extract all the XML fields automatically by changing the props.conf file in the application of interested (say search).

Here is a stanza example:

[my_xml_logs_source_type]
KV_MODE = xml
...
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Parsing XML with regex is a painful process, especially considering Splunk has commands tailored specifically for this.

Note, your example is not valid XML - elements should not contain spaces in their names. Once that's fixed, you can run this:

 search for your events | spath records.record | mvexpand records.record | spath input=records.record | where NOT Email=NotificationEmail

That will extract each record into its own event, parse the elements of the record, and filter according to the email fields.

lguinn2
Legend

The problem is that you need to extract multiple copies of the fields - assuming that the event is defined by the "\" tag.
Within the event, you have multiple values. There are a couple of ways to deal with this, but one would be

yoursearchhere
| rex maxmatch=0 "\<record\>(?<record>.*?)\</record\>"
| mvexpand record
|rex "<record.*NotificationEmail>(?<nemail>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<.*Email>(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b)<"
| where nemail!=email

The first rex and mvexpand break the original event into multiple events, one for each "record." After that, the original rex is applied and the comparison is made. I didn't verify that the regular expression is correct. Personally, I would have done something much more simple:

| rex "\<NotificationEmail\>(?<nemail>.*?)\</NotificationEmail\>.*?\<Email\>(?<email>.*?)\</Email\>"

somesoni2
Revered Legend

You want to filter the whole response (records set) where any of the record has NotificationEmail is equal to Email OR filter the record, within a response (record set) which has NotificationEmail is equal to Email?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...