IS it possible to filter parts of XML events befor...

kenferguson · ‎09-16-2016

I'm pretty sure this isn't possible from reading around, but want to check with some experts. I've looked around the Splunk site but the only similar question I found was an ancient unanswered one about a non-XML feed.

I've got an incoming stream of data in XML form where each packet is pretty large. I'd obviously like to only index the data that I actually need, otherwise the cost of the Splunk license becomes prohibitive, so can anyone tell me if there's a way of processing the data before it gets indexed and filtering out a subset of the XML?

I had a look at Heavy Forwarders, but I think they're only useful if I was looking to filter events in their entirely - I want to keep all the events, but throw away part of the data.

Example input:

<xmlblob>
 <comment>Lorem ipsum dolor sit amet, consectetur adipiscing elit
   ...
 </comment>
 <interestingdata>55</interestingdata>
</xmlblob>

which I'd want to convert to

<xmlblob>
 <interestingdata>55</interestingdata>
</xmlblob>

Thanks in advance for any help or thoughts!

jkat54 · ‎09-16-2016

For this I would use SEDCMD-<class> in my props.conf.

For example, props.conf on the Forwarders & Indexers:

[myXMLDataSourceType]
...
SHOULD_LINEMERGE = True
SEDCMD-aaa_removesComments = s/\s+<comment>.*<\/comment>//g

I put aaa on the class because the SEDCMDs are applied in ascii order. So as you're modifying the data you probably want to do so in a very specific sequence.

kenferguson · ‎09-23-2016

Thanks - I haven't had a chance to verify because I've been pulled on to another part of the project. Will take a look when I have a moment.

Thanks!

IS it possible to filter parts of XML events before indexing?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases