Getting Data In

Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?

Graham_Hanningt
Builder

I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:

{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}

(For the purposes of this question, please overlook the code and time properties.)

In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the test property values.

I was happy to see that it does:

alt text

But wait: where's the not sign?

I looked at the raw events in Splunk Web:

{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}

Note:

  • In case you're wondering, I use a transform to remove the code property.
  • My props.conf file specifies KV_MODE = json

Splunk replaced the not sign in the original incoming JSON Lines with the character sequence \xAC!

While AC is the correct Unicode code point in hexadecimal for a not sign, \x is not a valid escape sequence in JSON!

By introducing this escape sequence, Splunk has corrupted the JSON.

This looks like a bug to me.

I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.

My question(s)

  • Is this behavior a bug, as I suspect?
  • How many other characters are affected by this behavior?
0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}" 
| multikv noheader=t 
| spath 
| fields - _*
| table code time test

In Splunk version 8, this is fixed.

code    time    test
variant-characters  2019-10-15T10:00:00+08:00   | (vertical bar): |
variant-characters  2019-10-15T10:00:00+08:00   @ (commercial at): @
variant-characters  2019-10-15T10:00:00+08:00   # (number sign, hash): #
variant-characters  2019-10-15T10:00:00+08:00   ¬ (not sign): ¬ 
0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...