I saw this in \etc\system\README\transforms.conf.example:
REGEX = (?m)^(.*)SessionId=\w+(\w{4}[&"].*)$
What does the (?m) mean before the caret? Is this really matching 0 or 1 "m" characters at the end of the previous line, or does it have some special meaning?
It declares the regex to read multiline data, i.e., don't stop the regex on a line break.
The (?<option_flag>) construct allows you to set various matching properties like case-insensitivity, multiline, greedy, etc. See http://www.regextester.com/pregsyntax.html for more info.
In general, all Splunk regexes use the PCRE flavor of regex, which is substantially the same regex syntax as Perl, Python, PHP, but significantly different from grep (or egrep).
It declares the regex to read multiline data, i.e., don't stop the regex on a line break.
The (?<option_flag>) construct allows you to set various matching properties like case-insensitivity, multiline, greedy, etc. See http://www.regextester.com/pregsyntax.html for more info.
More importantly, this "multiline mode" means that ^
and $
match the beginning and end (respectively) of each line, not the beginning and end of the entire string. This is important in multiline events, in case you want to find an item at the beginning of a line somewhere in your event.
And just as importantly, rex will match against the entire event without (?m) even if there are line breaks. With max_match=0, rex will even match on the pattern multiple times in the same event thus creating a multi-value field.