Splunk Search

Multivalue XML extraction not working

responsys_cm
Builder

I'm trying to add several lines of XML to a multi-valued field. The data looks like:

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hashtables Denial of Service - The Exploit-DB Ref : 18296]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18296]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[PHP Hash Table Collision Proof Of Concept - The Exploit-DB Ref : 18305]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18305]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4153]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2011-4885]]></REF>

<DESC><![CDATA[MyBulletinBoard (MyBB) <= 1.1.5 (CLIENT-IP) SQL Injection Exploit - The Exploit-DB Ref : 2012]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/2012]]&gt;&lt;/LINK>

</EXPLT>

<EXPLT>

<REF><![CDATA[CVE-2012-0781]]></REF>

<DESC><![CDATA[PHP 5.3.8 Multiple Vulnerabilities - The Exploit-DB Ref : 18370]]></DESC>

<LINK><![CDATA[http://www.exploit-db.com/exploits/18370]]&gt;&lt;/LINK>

</EXPLT>

My transforms.conf looks like:

[qualys_exploit]

REGEX = (?mis)(&lt;EXPLT&gt;.*&lt;/EXPLT&gt;)

FORMAT = qualys_exploit::$1

MV_ADD = true

props.conf:

REPORT-qualys_exploit = qualys_exploit

Splunk is taking everything between the first opening EXPLT tag and last closing EXPLT tag and making it a single event. What am I doing wrong that it's not treating these as multiple individual events?

Thx.

C

Tags (1)
0 Karma
1 Solution

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

View solution in original post

andreas
Explorer

The quantifier * in the REGEX is greedy, so the expression . * is eating up all the chars before the last </EXPLT>
Try adding a ? after the * to make it non-greedy, so the regex "stops" at the next </EXPLT>, not the last.

REGEX = (?mis)(<EXPLT>.*?</EXPLT>)

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...