Splunk Search

Using regex, how do you extract data when there are special characters?

meinfan
New Member

I am trying to create a Regular Expression string which could extract several key pieces of data from a syslog event that has been consumed Splunk. This is how the data appears in Splunk:

Note - i have scrubbed some of the data for privacy purposes. string1, string2 string2 not the actual

2018-10-12 05:15:19 Local7.Debug    string1 string2 1.3.6.1.4.1.1.3.6.1.4.1.26393.12.1.0.1 1539321310 10.2.104.150 Ver2 1.3.6.1.4.1.26393.99.10.1.1 string3 1.3.6.1.4.1.26393.99.10.1.2 0 1.3.6.1.4.1.26393.99.10.1.3 5 1.3.6.1.4.1.26393.99.10.1.4 \x83

 1.3.6.1.4.1.26393.99.10.1.5 Controller: CTIHost sent a fail hard. 1.3.6.1.4.1.26393.99.10.1.6  1.3.6.1.4.1.26393.99.10.1.7  1.3.6.1.4.1.26393.99.10.1.8  1.3.6.1.4.1.26393.99.10.1.9  1.3.6.1.4.1.26393.99.10.1.10  1.3.6.1.4.1.26393.99.10.1.11 0 1.3.6.1.4.1.26393.99.10.1.12 0

I have successfully created two regular expressions in the Splunk 'Extract Fields' tool to build my results, but the data is missing within the actual error message. In this above example it is "Controller: CTIHost sent a fail hard.". When i try to add this to my 'Field extraction, i get an error in Splunk telling me the expression contains invalid characters. I believe the invalid characters is causing my issues ().

This is what Splunk is showing the Regular Expression as when it is being created:

^[^\\\n]*\\\w+\d+\s+\s+\s+\d+\.\d+\.\d+\.\d+\.\d+\.\d+\.(?P\d+\.\d+\.\d+\.\d+\.\d+\s+\w+:\s+\w+\s+\w+\s+\w+\s+\w+\s+\w+\.\s+\d+\.\d+)

When putting this in data in notepad++, the different 's appear as black boxes with characteralt texts:

I did research the different Unicode and this is the data i was able to identify:

BEL = u0007
FF = U+000C
ENQ = U+2405 or U+0005
SI = U+000F
EOT = U+0004

How can i do a regular expression and get past this?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.

\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.

\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...