Hello,
I am trying to parse a log from a Tipping Point IPS. An example of the log I get is (the log is cut for clarity, there is normally more on the line)
Nov 28 07:37:50 10.22.250.151 8 4 dab8b814-b100-11e0-06b9-e527e93f10b7 00000001-0001-0001-0001-000000004270 4270: HTTP: PHP Code Injection 4270
Everything is OK when parsing it via
rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):"
and I get the ACTION, CRIT and ATTACKID fields. So far so good.
I then wanted to get the next piece of information which is the attack description (HTTP: PHP Code Injection). Fields are separated by a TAB. I therefore tried
rex "[a-zA-Z]+\\s+\\d+\\s+\\d+:\\d+:\\d+\\s+\\d+\\.\\d+\\.\\d+\\.\\d+\\s+(?P<ACTION>\\d+)\\s+(?P<CRIT>\\d+)\\s+[0-9-]+\\s+[0-9-]+\\s+(?P<ATTACKID>\\d+):\s+(?P<ATTACKNAME>.+)\\t\\d+"
the idea being to match every character up to the tab one. I end up catching the remaining of the line (ie. the match does not stop at the tab).
I tried to run this through Rubular with the source data copied/pasted from Splunk and it works (this is to say that there is indeed a tab as a separator, I also see this in the search window). Looks like there is a specific way to catch the tab character, or that \.+
catches everything until the end of the line.
Thanks a lot for any pointer (and sorry as my question must be obvious to someone used to regex) -- WoJ
You need to use a non-greedy match. The current greedy one looks like this:
(?P<ATTACKNAME>.+)\t
which tells the regex engine to return the longest possible match that satisfies the conditions. The corresponding non-greedy match would be (note the "?"):
(?P<ATTACKNAME>.+?)\t
This tells the regex engine to return the shortest possible match, i.e. only match up until the first tab character it finds.
Thanks Ayn for the answer.
I also managed to do the same replacing \.+
by [^\t]+
You need to use a non-greedy match. The current greedy one looks like this:
(?P<ATTACKNAME>.+)\t
which tells the regex engine to return the longest possible match that satisfies the conditions. The corresponding non-greedy match would be (note the "?"):
(?P<ATTACKNAME>.+?)\t
This tells the regex engine to return the shortest possible match, i.e. only match up until the first tab character it finds.