Splunk Search

How do I search by unicode value?

yyossef
Explorer

Hi,

I have the following example record:

30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; "msisdn":"xxxxxxxxx","Type":"\u0006","APN":"aaa","imsi":"xxxxxxxx","imei":"xxxxxxxxx","SGSN":null,"Remote IP Address":"xx.xx.xx.xx","TotalTimeInMS":0}

I can not search by Type, because it is a unicode value, and Splunk does not parse it correctly.

The are 2 possible Type values: 1. "\u0006" 2. "\u0003".

I am using the following splunk search:
mysearch | spath input=anyparams | search Type="\u0006"

The problem is that i receive no result,

How should I use the search, when the field contains a unicode value?

Thanks in advance,

Yossi

0 Karma
1 Solution

niketn
Legend

@yyossef, if you are searching unicode stored as text you would need to escape backslash by prefixing another backslash i.e. "\\u0006" or "\\u0003" in your SPL.

Following is and example to use the same in search filter or eval function

 <yourCurrentSearch>
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"

Following is run anywhere search based on sample data provided:

| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| extract pairdelim="," kvdelim=":"
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

View solution in original post

MichaelArsenaul
New Member

Just curious: why are unicode values not being cleansed/translated before the information gets sent to Stripe? As far as I know, data like this very rarely makes its way into Splunk, and much of what passes as weird UTF-8 codes do not make it into Splunk at all.

Ian Quick shared this example code with us that shows how to test for uTF-8 characters and strip them out: https://github.com/Shopify/shopify-tracing/commit/816ba2aef3c6ee8a232766028181b7b1ca03a2b1

I'd highly recommend cleansing your data before emitting to Stripe. Once the data is in Splunk, 99.9% of the UTF code will be lost and Splunk will not help you debug that issue. Cleansing your output before it hits Stripe is probably the best course of action.

0 Karma

ddrillic
Ultra Champion

Looking at Unicode Character 'ACKNOWLEDGE' (U+0006)

alt text

It tells us that \u0006 is not a unicode/utf-8 character representation - it's the way several programming languages chose to represent it.

0 Karma

niketn
Legend

@yyossef, if you are searching unicode stored as text you would need to escape backslash by prefixing another backslash i.e. "\\u0006" or "\\u0003" in your SPL.

Following is and example to use the same in search filter or eval function

 <yourCurrentSearch>
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"

Following is run anywhere search based on sample data provided:

| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| extract pairdelim="," kvdelim=":"
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

yyossef
Explorer

Hi @niketnilay,

Yours other suggestion using searchmatch worked.

| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| eval TypeDescription=case(searchmatch("\u0006"),"ACKNOWLEDGE",searchmatch("\u0004"),"END OF TEXT",true(),"Others")
| search TypeDescription="ACKNOWLEDGE"

Why would searchmatch works while Type=="\u0006" did not?

niketn
Legend

@yyossef Type field is not getting auomatically extracted as part of Search Time field discovery. The searchmatch command finds the pattern match in the entire raw data. You would need to create your own Field Extraction to create a Type field based on Regular Expression.

I am glad your issue is resolved. Do let us know if you need further help. Do up vote the answer/comments that helped! 🙂

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

yyossef
Explorer

Hi @niketnilay,

Thanks for your prompt response.
Still no luck, the search result is empty.

When using your 4 example, the result came back with only the deafault value "Other", meaning, no match was found.
I am not sure that the unicode is stored as text, i think it is display as text by the system, but stored as unicode value.
Do you have idea how to verify that? or how to search by unicode value?

0 Karma

niketn
Legend

@yyossef, I am not sure whether the Type field is actually being extracted or not... So first let us try a different approach. Following example does not try to extract Type field. Instead searched for unicode characters in raw data.

| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| eval TypeDescription=case(searchmatch("\\u0006"),"ACKNOWLEDGE",searchmatch("\\u0004"),"END OF TEXT",true(),"Others")
| search TypeDescription="ACKNOWLEDGE"
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...