I'm pretty sure this isn't possible from reading around, but want to check with some experts. I've looked around the Splunk site but the only similar question I found was an ancient unanswered one about a non-XML feed.
I've got an incoming stream of data in XML form where each packet is pretty large. I'd obviously like to only index the data that I actually need, otherwise the cost of the Splunk license becomes prohibitive, so can anyone tell me if there's a way of processing the data before it gets indexed and filtering out a subset of the XML?
I had a look at Heavy Forwarders, but I think they're only useful if I was looking to filter events in their entirely - I want to keep all the events, but throw away part of the data.
Example input:
<xmlblob>
<comment>Lorem ipsum dolor sit amet, consectetur adipiscing elit
...
</comment>
<interestingdata>55</interestingdata>
</xmlblob>
which I'd want to convert to
<xmlblob>
<interestingdata>55</interestingdata>
</xmlblob>
Thanks in advance for any help or thoughts!
For this I would use SEDCMD-<class>
in my props.conf.
For example, props.conf on the Forwarders & Indexers:
[myXMLDataSourceType]
...
SHOULD_LINEMERGE = True
SEDCMD-aaa_removesComments = s/\s+<comment>.*<\/comment>//g
I put aaa on the class because the SEDCMDs are applied in ascii order. So as you're modifying the data you probably want to do so in a very specific sequence.
Thanks - I haven't had a chance to verify because I've been pulled on to another part of the project. Will take a look when I have a moment.
Thanks!