Splunk Search

Help with extracting fields using numeric range

jambajuice
Communicator

I am trying to create a field extraction for events where a plugin_id field matches a range of numbers.

This search returns all of the events that I want:

sourcetype=ossim "Event received" ((plugin_id>=1001 AND plugin_id<=1131) NOT plugin_id=1002))

Here is the regex to extract the signature_id from the events. If I hard-code the plugin_id in the regex, the extraction works. If I try and use the numeric ranges listed below, it fails:

EXTRACT-snort_signature (?i) plugin_id="10[0-9][0-9]|11[0-2][0-9]|113[0-1]".*log="\s*(?P<snort_signature>[^,]+)

Why isn't this extracting the signature properly?

Thanks.

Craig

Tags (1)

Paolo_Prigione
Builder

The | in regexes has to be constrained by round brackets, otherwise it will "split" your regex in two pieces and use them as alternative matches.

EXTRACT-snort_signature = (?i) plugin_id="(?:10[0-9][0-9]|11[0-2][0-9]|113[0-1])".*log="\s*(?P<snort_signature>[^,]+)

The (?: creates a non-capturing subpattern. I have also noticed that your regex does not take into account the log field's ending ", so you might want to check that too.

Also, basing myself on this sample line I've found elsewhere:

2010-11-18 15:46:20 OSSIM-Message: Event received: event id="0" alarm="0" type="detector" fdate="1969-12-31 19:33:30" date="2010" tzone="0" plugin_id="1001" plugin_sid="1394" src_ip="63.215.202.48" src_port="80" dst_ip="82.150.0.6" dst_port="8197" sensor="10.1.116.31" interface="eth1" protocol="TCP" asset_src="2" asset_dst="2" log="[**] [1:1394:12] SHELLCODE x86 inc ecx NOOP [**] [Classification: Executable code was detected] [Priority: 1] 11/18-15:47:00.198740 63.215.202.48:80 -> 82.150.0.6:8197"

If you have not altered Splunk's automated field extraction, you would end up with a ton of fields, including plugin_id and log. Why not just alias the field "log" to "snort_signature" in props.conf?

FIELDALIAS-sno_sig = log as snort_signature

This would sure enhance performance, but make the aliased field available to all the plugin_id values.

Rob
Splunk Employee
Splunk Employee

He won't need the '=' as he is not using any options for the extract command and apparently making a call to a stanza labeled 'snort_signature' in his transforms.conf. However, in case the extract syntax needs to be checked, here is the documentation reference http://www.splunk.com/base/Documentation/4.1.6/SearchReference/Extract

0 Karma

Rob
Splunk Employee
Splunk Employee

From the line you have listed your are going to match something like the following: plugin_id="1001" unrelatedInfoThatWillBeMatched somelog=" snortSignature anythingElse

The parenthetical value will be: snortSignature anythingElse

The [^,]+ part will match anything anything after your first match as it is essentially looking for anything that could be considered the start of a line.

I don't know what your events look like but you might want to try the following regex, which I based on your own regex, to hopefully match what I think the event may contain.

plugin_id="10\d[1,3,4-9]"|"11[0-2]\d"|"113[0-1]".*log="\s*(?P<snort_signature>.*)"

This will the plugin_id match an event that looks somewhat like this:

plugin_id="1001" irrelevantInfo somelog="   snort123" more irrelevantInfo

The extracted value for snort_signature will be: snort123

Furthermore, the extraction is going to depend on what you have defined in your snort_signature stanza in your transforms.conf file.

If you can provide an example line that you are trying to extract fields from then we can see about fine tuning the extraction regex to match what you are looking for.

Rob
Splunk Employee
Splunk Employee

However, in reference to the EXTRACT syntax being used, here is an excerpt from the Splunk documentation found at http://www.splunk.com/base/Documentation/4.1.6/SearchReference/Extract


Syntax:
Description: A stanza that can be found in transforms.conf. This is used when props.conf did not explicitly cause an extraction for this source, sourcetype, or host.

0 Karma

Rob
Splunk Employee
Splunk Employee

Paolo, you are correct with the caret immediately after the bracket. However, since there is no closing quote, it means that everything after the beginning quote will be matched to the field that is to be mapped. Without an event line to customize this to, we are making more of a guess as to what the specific regex should be.

0 Karma

Paolo_Prigione
Builder

Rob, EXTRACT-snort_signature is a configuration in props.conf which will not rely on transforms.conf (that would be something like REPORT-snort_signature = transforms_stanza_name). It isn't the extract search command. Thus southeringtonp's comment.
Also, in regex language, [^,]+ means "everything that's not a comma": the ^ as the first char inside a character classes ([]) is a negation.

0 Karma

southeringtonp
Motivator

Your extract line is missing an '=' after EXTRACT-snort_signature -- just a typo?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...