I am trying to create a Regular Expression string which could extract several key pieces of data from a syslog event that has been consumed Splunk. This is how the data appears in Splunk:
Note - i have scrubbed some of the data for privacy purposes. string1, string2 string2 not the actual
2018-10-12 05:15:19 Local7.Debug string1 string2 1.3.6.1.4.1.1.3.6.1.4.1.26393.12.1.0.1 1539321310 10.2.104.150 Ver2 1.3.6.1.4.1.26393.99.10.1.1 string3 1.3.6.1.4.1.26393.99.10.1.2 0 1.3.6.1.4.1.26393.99.10.1.3 5 1.3.6.1.4.1.26393.99.10.1.4 \x83
1.3.6.1.4.1.26393.99.10.1.5 Controller: CTIHost sent a fail hard. 1.3.6.1.4.1.26393.99.10.1.6 1.3.6.1.4.1.26393.99.10.1.7 1.3.6.1.4.1.26393.99.10.1.8 1.3.6.1.4.1.26393.99.10.1.9 1.3.6.1.4.1.26393.99.10.1.10 1.3.6.1.4.1.26393.99.10.1.11 0 1.3.6.1.4.1.26393.99.10.1.12 0
I have successfully created two regular expressions in the Splunk 'Extract Fields' tool to build my results, but the data is missing within the actual error message. In this above example it is "Controller: CTIHost sent a fail hard.". When i try to add this to my 'Field extraction, i get an error in Splunk telling me the expression contains invalid characters. I believe the invalid characters is causing my issues ().
This is what Splunk is showing the Regular Expression as when it is being created:
^[^\\\n]*\\\w+\d+\s+\s+\s+\d+\.\d+\.\d+\.\d+\.\d+\.\d+\.(?P\d+\.\d+\.\d+\.\d+\.\d+\s+\w+:\s+\w+\s+\w+\s+\w+\s+\w+\s+\w+\.\s+\d+\.\d+)
When putting this in data in notepad++, the different 's appear as black boxes with characters:
I did research the different Unicode and this is the data i was able to identify:
BEL = u0007
FF = U+000C
ENQ = U+2405 or U+0005
SI = U+000F
EOT = U+0004
How can i do a regular expression and get past this?
The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.
\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)
The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.
\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)