We're trying to extract fields that match this [ FIELD_NAME = S0m3 Valu3 w\ reaLLy $pec!aL ch*rac+3rs ]
and write them on tsidx so that their consumable on tstats
. We're using the transforms-props partnership below
# transforms.conf
[hello_transforms]
REGEX = (?<key>[\w]+)\s\=\s(?<value>[^\]]+)
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true
#props.conf
[hello]
DATETIME_CONFIG =
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
TRANSFORMS-capturer = hello_transforms
While it is doing what's expected for most of the fields, (i.e. fields are written on disk, verified through walklex), some values failed to be captured entirely or as expected. For example
[ REMARKS = A Kerberos authentication ticket (TGT) was requested. ]
Splunk only captured "A". See screenshot below.
REGEX VALID:
Do you think this is Splunk's REGEX engine's fault or I have something wrong in my configs?
Thanks in advance.
Sample:
| makeresults
| eval _raw="Feb 7 11:25:20 SYD-UTIL-02 ADAuditPlus [ Category = LogonReports ] [ REMARKS = A Kerberos authentication ticket (TGT) was requested. ]"
| rex max_match=0 "\[\s*(?<key>\S+)\s\=\s(?<value>.*?)\]"
transforms.conf
REGEX = \[\s*(\S+)\s\=\s(.*?)\]
need ]
If you use FORMAT
in props.conf , capture name is not need.
Using FORMAT:
REGEX = ([a-z]+)=([a-z]+)
FORMAT = $1::$2
Not using FORMAT:
REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)
Same result
@marethanyell
Do you restart/refresh Splunk?
At least, [ REMARKS = A Kerberos authentication ticket (TGT) was requested. ]
is not same result.
Edited transforms.conf with your regex. Stopped Splunk. Deleted index using "clean eventdata" (don't worry, it's a dev machine). Then restarted Splunk. Re indexed the file using one-shot. Still fails to capture the entire value. It stops at whitespace
.
My old Regex also works on | rex
but it does not on transforms.conf
@morethanyell
we both have a mistake. my answer is updated.
I'm sorry.
Same issue, mate. I've used your transforms and it still fails to capture the entire thing and halts at whitespace
[aap_fields_discov]
REGEX = \[\s*(\S+)\s\=\s(.*?)\s\]
REPEAT_MATCH = true
WRITE_META = true
(T_T)
sedcmd-whitespace = s/\s/ /g
why REGEX halt with white space?
I don't understand.
By paper, it should capture this
[ FIELDNAME = The quick brown fox jumps over the lazy dog. ]
If you try it on | rex
or on regex101.com, it does work. But when implemented on transforms.conf, it only captures "The"...so, the field value will be "FIELDNAME = The" instead of entire "FIELDNAME = The quick brown fox jumps over the lazy dog."
It's not appropriate anymore to show evidence that the regex is working via | rex
or regex101.com because as I've said before, it does work via those mediums. But not when used in transforms.conf for index-time field extraction, it doesn't.
Out of frustration, I've changed the strategy of capturing the fields by enclosing values with double quotes (e.g. [ FIELDNAME = s0m3 vaLu3 ]
becomes [ FIELDNAME ="s0m3 vaLu3" ]
) using SEDCMD on props instead of transforms.conf.
Thanks for the help.