Splunk Search

rex - matching everything until a tab

wsw70
Communicator

Hello,

I am trying to parse a log from a Tipping Point IPS. An example of the log I get is (the log is cut for clarity, there is normally more on the line)

Nov 28 07:37:50 10.22.250.151 8 4   dab8b814-b100-11e0-06b9-e527e93f10b7    00000001-0001-0001-0001-000000004270    4270: HTTP: PHP Code Injection  4270

Everything is OK when parsing it via

rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):"

and I get the ACTION, CRIT and ATTACKID fields. So far so good.

I then wanted to get the next piece of information which is the attack description (HTTP: PHP Code Injection). Fields are separated by a TAB. I therefore tried

rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):\s+(?P<ATTACKNAME>.+)\\t\\d+"

the idea being to match every character up to the tab one. I end up catching the remaining of the line (ie. the match does not stop at the tab).

I tried to run this through Rubular with the source data copied/pasted from Splunk and it works (this is to say that there is indeed a tab as a separator, I also see this in the search window). Looks like there is a specific way to catch the tab character, or that \.+ catches everything until the end of the line.

Thanks a lot for any pointer (and sorry as my question must be obvious to someone used to regex) -- WoJ

Tags (2)
0 Karma
1 Solution

Ayn
Legend

You need to use a non-greedy match. The current greedy one looks like this:

(?P<ATTACKNAME>.+)\t

which tells the regex engine to return the longest possible match that satisfies the conditions. The corresponding non-greedy match would be (note the "?"):

(?P<ATTACKNAME>.+?)\t

This tells the regex engine to return the shortest possible match, i.e. only match up until the first tab character it finds.

View solution in original post

wsw70
Communicator

Thanks Ayn for the answer.
I also managed to do the same replacing \.+ by [^\t]+

Ayn
Legend

You need to use a non-greedy match. The current greedy one looks like this:

(?P<ATTACKNAME>.+)\t

which tells the regex engine to return the longest possible match that satisfies the conditions. The corresponding non-greedy match would be (note the "?"):

(?P<ATTACKNAME>.+?)\t

This tells the regex engine to return the shortest possible match, i.e. only match up until the first tab character it finds.

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...