Hi all,
I've been struggling with Splunk for weeks now (and had Developer training!) and I still can't get it to do what I want it to do, so here begins the first of many questions....
I'm attempting to build an app that does a single parse of some static data. Basically it's designed to read in lots of files and then using a dashboard, display the data in a meaningful way.
As such I'm attempting to do Index-time field extraction, as I want the displays to be as fast as possible for the end user. I've tried this a thousand ways and I can't get it working 😞
All of the data is in XML format, and a large chunk of it features multiple field values, which is where I'm getting stuck. I can extract multi-valued fields with no problem using REX, but it seems to refuse to do it using the config files. I've compiled the following example to show you what I mean, I've just done it with one file, but I'm having the same problem with all files I'm pulling in:
props.conf
[nessus]
SHOULD_LINEMERGE = False
LINE_BREAKER = (?<=</ReportHost>)([\r\n]+)
TRUNCATE = 0
TRANSFORMS-nessus_high_vulnerbility = nessus_high_vulnerbility
transforms.conf
[nessus_high_vulnerbility]
REGEX = <ReportItem.*severity=\"3\".*pluginName=\"([^"]+)\"
FORMAT = nessus_high_vulnerbility::"$1"
LOOKAHEAD = 10000000000
WRITE_META = true
REPEAT_MATCH = true
fields.conf
[nessus_high_vulnerbility]
INDEXED = true
Example data
<Report name="1.1.1.1">
<ReportHost name="1.1.1.1"><HostProperties>
<tag name="HOST_END">Tue Nov 22 12:06:01 2011</tag>
<tag name="system-type">general-purpose</tag>
<tag name="operating-system">Linux Kernel 2.6.9-101.ELsmp on Red Hat Enterprise Linux ES release 4 (Nahant Update 9)</tag>
<tag name="mac-address">00:00:00:00:00:00</tag>
<ReportItem port="1234" svc_name="snmp?" protocol="udp" severity="3" pluginID="51160" pluginName="SNMP Agent Default Community Name (public)" pluginFamily="SNMP">
</ReportItem>
<ReportItem port="0" svc_name="general" protocol="tcp" severity="3" pluginID="21157" pluginName="Unix Compliance Checks" pluginFamily="Policy Compliance">
</ReportItem>
</ReportHost>
</Report>
Now if I search for * it tells me that the "nessus_high_vulnerbility" field has one result.
But if I do the following search, the "high_vulnerbility" field has 2 results, the correct number.
* | rex "\<ReportItem.*severity=\"3\".*pluginName=\"(?<high_vulnerbility>[^\"]+)\"" max_match=100000
I've tried everything I can think of, been through the documentation a hundred times, and still can't figure it out. Please help!
(PS, apologies if the above doesn't come out right, I'm struggling with getting Markdown to play nicely with the pasted code)
... View more