Splunk Search

Using regex, how do you extract data when there are special characters?

meinfan
New Member

I am trying to create a Regular Expression string which could extract several key pieces of data from a syslog event that has been consumed Splunk. This is how the data appears in Splunk:

Note - i have scrubbed some of the data for privacy purposes. string1, string2 string2 not the actual

2018-10-12 05:15:19 Local7.Debug    string1 string2 1.3.6.1.4.1.1.3.6.1.4.1.26393.12.1.0.1 1539321310 10.2.104.150 Ver2 1.3.6.1.4.1.26393.99.10.1.1 string3 1.3.6.1.4.1.26393.99.10.1.2 0 1.3.6.1.4.1.26393.99.10.1.3 5 1.3.6.1.4.1.26393.99.10.1.4 \x83

 1.3.6.1.4.1.26393.99.10.1.5 Controller: CTIHost sent a fail hard. 1.3.6.1.4.1.26393.99.10.1.6  1.3.6.1.4.1.26393.99.10.1.7  1.3.6.1.4.1.26393.99.10.1.8  1.3.6.1.4.1.26393.99.10.1.9  1.3.6.1.4.1.26393.99.10.1.10  1.3.6.1.4.1.26393.99.10.1.11 0 1.3.6.1.4.1.26393.99.10.1.12 0

I have successfully created two regular expressions in the Splunk 'Extract Fields' tool to build my results, but the data is missing within the actual error message. In this above example it is "Controller: CTIHost sent a fail hard.". When i try to add this to my 'Field extraction, i get an error in Splunk telling me the expression contains invalid characters. I believe the invalid characters is causing my issues ().

This is what Splunk is showing the Regular Expression as when it is being created:

^[^\\\n]*\\\w+\d+\s+\s+\s+\d+\.\d+\.\d+\.\d+\.\d+\.\d+\.(?P\d+\.\d+\.\d+\.\d+\.\d+\s+\w+:\s+\w+\s+\w+\s+\w+\s+\w+\s+\w+\.\s+\d+\.\d+)

When putting this in data in notepad++, the different 's appear as black boxes with characteralt texts:

I did research the different Unicode and this is the data i was able to identify:

BEL = u0007
FF = U+000C
ENQ = U+2405 or U+0005
SI = U+000F
EOT = U+0004

How can i do a regular expression and get past this?

0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.

\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)
---
If this reply helps you, Karma would be appreciated.

View solution in original post

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The field extraction wizard is not particularly smart about how it creates regex strings. It's not necessary to identify every character from the beginning of the event to the desired field. One only needs to find a unique starting point. In your sample event, I used EOT. Try this regex to see if it works for you.

\x04[\s\S]+\s(?P<CC_Error>\w+:[^\.]+)
---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...