Getting Data In

Why is the null value in a JSON event not being parsed properly as NULL?

anthonysomerset
Path Finder

I have the following data coming into Splunk in JSON format and extracted at index-time:

{ 
    administrativeState:  Unlocked 
    apGroupId:  7954abec-de25-4758-bb5e-0d87be2f0254 
    approvedTime:  1435500193785 
    clientCount:  0 
    configState:  newConfig 
    connectionState:  Disconnect 
    countryCode:  XX 
    cpId:  15e2811e-a24f-4570-bd58-8f18c7e6ed91 
    description: 
    dpId: 
    externalIp:  xxx.xx.xx.xx
    externalPort:  55540 
    ip:  xxx.xx.xx.xx
    ipType:  Static 
    ipv6: 
    ipv6Type:  null 
    isCriticalAP:  false 
    lastSeenTime:  1454078438291 
    latitude:  null 
    location:  XXXXXXXXX 
    longitude:  null 
    mac:  F0:B0:52:3B:F2:10 
    meshHop:  0 
    meshRole:  null 
    model:  T300 
    name:  XXXXXXXXX
    provisionMethod:  Discovered 
    provisionStage:  null 
    registrationState:  Approved 
    serial:  491484101277 
    uptime:  81630 
    version:  3.1.1.0.349 
    wifi24Channel:  2 
    wifi50Channel:  104 
    zoneId:  d41a3397-b859-42cd-9332-e308a48069b2 
}

Note: The latitude and logitude fields in this case are null and Splunk's syntax highlighter shows it like null or boolean would be in any editor.

I can search for longitude=null no problems, however, when it comes to enriching the event with other data from a lookup and trying to coalesce the field with another like so:

eval merged_latitude=coalesce(latitude,zone_lat,0)

Then it appears to be assuming that null is actually not null and using the null value rather than attempting to look at the next field or even the fail safe 0 value.

In order to get the null to be correctly seen as NULL I have to insert the following into my search:

eval latitude=if(latitude="null",FALSE,latitude)

This seems perhaps the "wrong" way to fix the problem, is there not a better approach to this?

1 Solution

somesoni2
SplunkTrust
SplunkTrust

Splunk sees "null" as a valid string value, hence all the issues. (and actually there is no notation that can be used to denote null values other then value not present at all). So to fix this, either you can replace all null with blank (no value) in the raw data before indexing (works only for future data) OR handle the same in search time.

For option 1, see these

https://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html
http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Anonymizedatausingconfigurationfiles

For option 2, you already have one option listed. Others options could be like this

your base search | replace "null" with "" in field1, field2, field3

your base search | rex mode=sed "s/null//g"

View solution in original post

somesoni2
SplunkTrust
SplunkTrust

Splunk sees "null" as a valid string value, hence all the issues. (and actually there is no notation that can be used to denote null values other then value not present at all). So to fix this, either you can replace all null with blank (no value) in the raw data before indexing (works only for future data) OR handle the same in search time.

For option 1, see these

https://answers.splunk.com/answers/210096/how-to-configure-sedcmd-in-propsconf.html
http://docs.splunk.com/Documentation/Splunk/6.2.6/Data/Anonymizedatausingconfigurationfiles

For option 2, you already have one option listed. Others options could be like this

your base search | replace "null" with "" in field1, field2, field3

your base search | rex mode=sed "s/null//g"

Graham_Hanningt
Builder

Thanks for this answer. I just ran into the same issue (in Splunk Enterprise 6.4).

I am sending JSON-format logs to Splunk via the HTTP Event Collector (EC). Some field values are explicitly set to null:

some_performance_indicator_field: null

One reason for setting a field value to null, rather than omitting the field, is to indicate that the version of the performance monitoring system that generated this data does report that performance indicator (from the underlying transaction processing system being monitored), but that, in this instance, there was no value to report. As opposed to a (typically earlier) system that does not report that indicator, and so that field does not occur in its output. A null value tells you something about the performance indicator, whereas, if the field is absent, the performance indicator might have had any value; it just wasn't reported.

So (with apologies to readers who are well aware of this), there can be a significant semantic difference in JSON between a field being set to the explicit value null and that field being omitted.

It appears (and my own ad hoc testing confirms this answer) that Splunk interprets the JSON null field value as the string value "null".

That behavior is consistent with the following definition in the Splunk documentation topic "The search processing language syntax":

Null field
A null field is not present on a particular result or event.

(Although I would prefer it if the documentation explicitly covered the case of the JSON null value being interpreted as the string value "null", and any similar issues with other input data formats. Perhaps it does; just not in that topic.)

I presume (I haven't tested this) that this means that, for the purposes of searching in Splunk, the following JSON KV pair:

myfield: null

is identical to:

myfield: "null"
0 Karma

Graham_Hanningt
Builder

This is still an issue for me in Splunk 7.3.0.

Given the following input JSON Lines:

{"code":"test_null_value","time":"2019-11-07T08:00:00+08:00","test":null}
{"code":"test_null_value","time":"2019-11-07T08:00:01+08:00","test":"null"}
{"code":"test_null_value","time":"2019-11-07T08:00:02+08:00","test":"present"}

(For the purposes of this comment, you can ignore the code and time fields.)

the following search:

sourcetype="test_null_value" test="null"

returns two results: the event with "null" and the event with null.

Curiously, Splunk Web syntax-highlights the null in "test":null in orange, whereas the string value null in "test":"null" is green.

The following search:

sourcetype="test_null_value" | where isnull(test)

returns no results, as expected: the (JavaScript) value null does not correspond to Splunk's "not present" concept of NULL.

I don't know of any way in a Splunk search to distinguish between the string value "null" and the value null. Splunk seems to treat them as identical string values. It seems that orange syntax highlighting in the Search app is just for show; skin deep.

0 Karma

malvidin
Communicator

You can work around the "null" vs. null issue in some cases. You can use spath to get a bit more specific with nested JSON, but it isn't as simple as simple search criteria.

index=_internal
| head 1
| fields _time, _raw
| eval _raw = "{\"nullable_field\": null, \"nullable_list_0\": [ null ], \"nullable_list_1\": [ \"null\", null ] }"
| spath
| regex "\"nullable_field\"\s*:\s*null\b" `comment("single values are easier to identify than lists; could be converted to a macro")`
| eval includes_null_as_string = if(searchmatch("\"null\""), "true", "false") `comment("events with string that is not null")`

 

anthonysomerset
Path Finder

Thanks - at least your suggestions are much more efficient than mine

0 Karma

anthonysomerset
Path Finder

So it turns out that the first command the value is still treated as non NULL 😞 and the rex command doesn't even appear to be doing anything at all 😞

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...