Splunk Search

Multivalue XML extraction not working

responsys_cm
Builder

I'm trying to add several lines of XML to a multi-valued field. The data looks like:

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hashtables Denial of Service - The Exploit-DB Ref : 18296]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18296]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hash Table Collision Proof Of Concept - The Exploit-DB Ref : 18305]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18305]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4153]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[MyBulletinBoard (MyBB) <= 1.1.5 (CLIENT-IP) SQL Injection Exploit - The Exploit-DB Ref : 2012]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/2012]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2012-0781]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

My transforms.conf looks like:

[qualys_exploit]

REGEX = (?mis)(&lt;EXPLT&gt;.*&lt;/EXPLT&gt;)

FORMAT = qualys_exploit::$1

MV_ADD = true

props.conf:

REPORT-qualys_exploit = qualys_exploit

Splunk is taking everything between the first opening EXPLT tag and last closing EXPLT tag and making it a single event. What am I doing wrong that it's not treating these as multiple individual events?

Thx.

C

Tags (1)
0 Karma
1 Solution

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

View solution in original post

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...