Getting Data In

Issue with indexing JSON output from application

raffinyc
Explorer

Application running in Openshift 3.6 (kubernetes/Docker), generates raw JSON like this to STDOUT (when running standalone). We're not logging to files right now because we could not get logs from files into splunk (we're not sure where in Openshift apps should log, and that could be our issue)

{"@timestamp":"2017-11-29T01:04:40.938-05:00","@version":1,"message":"Preparing to transfer $1028.","logger_name":"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService","thread_name":"pool-1-thread-1","level":"INFO","level_value":20000,"transactionOwner":"Susan","transactionId":"9fe28a5b-7ed1-4adc-b852-b4exxxxxxxxx","appname":"my-java-app"}

We can see these log events in splunk, but they're not indexed correctly because none of our custom fields are available for searching, also, log events in splunk appear to be wrapping the raw event generated from our appto have slightly different structure compared to when the app logs standalone (not in Openshift)

{"log", <RAW LOG DATA HERE>}\n","stream":"stdout","time":"<TIMESTAMP>"}

Here's an example of the wrapping:

{"log":"{\"@timestamp\":\"2017-11-29T06:44:58.240+00:00\",\"@version\":1,\"message\":\"Has transfer of $681 completed successfully ? true.\",\"logger_name\":\"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService\",\"thread_name\":\"pool-1-thread-1\",\"level\":\"INFO\",\"level_value\":20000,\"transactionOwner\":\"John\",\"transactionId\":\"9c2025d7-399d-492c-84f1-3a5xxxxxxxxx\",\"appname\":\"my-java-app\"}\n","stream":"stdout","time":"2017-11-29T06:44:58.240849341Z"}

I suspect some intermediary component between Openshift and splunk is doing this wrapping which might be throwing off the parser, that's my take but I'm new to splunk so not entirely sure.

0 Karma

ssadh_splunk
Splunk Employee
Splunk Employee

Its a Super late response though but ran into a similar issue.
Special thanks to @abhagat_splunk .

Append the below search to the base search to parse out the data

| rex mode=sed "s/"\\\""/"\""/g"
| rex mode=sed "s/"\"{"/{/g"
| rex mode=sed "s/"}\""/}/g"
| spath

0 Karma

mattymo
Splunk Employee
Splunk Employee

***UPDATE* - July 2019 - Use Splunk Connect for Kubernetes!**

It makes easy work of this data and App for Infrastructure will add Openshift support for easy on boarding in an upcoming release!

https://github.com/splunk/splunk-connect-for-kubernetes

https://splunkbase.splunk.com/app/3975/

I have pushed a props.conf to our docker monitoring repo which will unwrap the JSON for you and allows you to support multiline logs and remove any escape chars and encoding from the logs.

https://github.com/splunk/docker-itmonitoring/blob/7.0.0-k8s/README-k8s.md

Would be happy to walk you through it in the slack chat if you like, (splk.it/slack), my handle is @mattymo. Big up Micah and automine for helping me nail the SED commands.

# Experimental props to remove json wrapper, escaped quotes, encoding and other cruft from docker logs. 
#
[kubernetes]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
# remove docker json wrapper, then remove escapes from the quotes in the log message. 
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
# another exprimental version of the sed.
#SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*)\\n","stream.*?([\n\r])/\1\2/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC

The approach I am taking is to use a base sourcetype called kubernetes to take care of unwrapping the json logs, and then using source based props to apply multiline log support, rename the sourcetype, or other specific parsing per each different log type.

For example, I use this to implement a line breaker for java multiline logs that come from a carts pod:

[source::/var/log/containers/carts-(?!db-)*.log]
#SHOULD_LINEMERGE = true 
SHOULD_LINEMERGE = false 
NO_BINARY_CHECK = true
#BREAK_ONLY_BEFORE = \d{4}\-\d{2}\-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{3}
LINE_BREAKER = ([\n\r]+){"log":"[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}\s
CHARSET=UTF-8
disabled=false

These two configs in conjunction get you what you would be looking for:

alt text

- MattyMo

raffinyc
Explorer

We determined the problem on our end to be caused by the Docker daemon (dockerd), which by default uses --log-driver=json-file. Not wanting to mess with such a low level config, we left the driver as-is and instead changed our splunk query:

source= | spath input=log

Simple enough! This performs indexing at search time, not extraction time which was our goal, but it still performs well enough for our needs.

outcoldman
Communicator

Hi @raffinyc
We have developed an application for Monitoring OpenShift clusters, including forwarding logs (container, host and openshift components) and monitoring stats (CPU, Memory, IO, etc) on level of processes, containers, pods, hosts. You can get the application from Splunkbase https://splunkbase.splunk.com/app/3836/ and find installation instructions on this page https://www.outcoldsolutions.com/docs/monitoring-openshift/

0 Karma

niketn
Legend

@raffinyc, For the existing wrapped JSON Data being sent to Splunk you might have to extract the inner JSON Data and replace escaped quotes \" with quotes " for spath to work ( during search time ) .

 <YourBaseSearch>
|  rex "{\"log\":\"(?<jsonData>.*)"
|  eval _raw=replace(jsonData,"\\\\\"","\"")
|  spath

Following is a valid JSON which should work straight out of the box if indexed to Splunk:

{"log": {"@timestamp":"2017-11-29T06:44:58.240+00:00", "@version":1, "message":"Has transfer of $681 completed successfully ? true.", "logger_name":"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService", "thread_name":"pool-1-thread-1", "level":"INFO", "level_value":20000, "transactionOwner":"John", "transactionId":"9c2025d7-399d-492c-84f1-3a51111d967d", "appname":"my-java-app"} ,"stream":"stdout" ,"time":"2017-11-29T06:44:58.240849341Z"}

Following is the run anywhere search with your mocked sample data (PS: I have escaped quotes with backslash to generate mock data in Splunk for demo/test. The same is not required while indexing)

|  makeresults
|  eval _raw="{\"log\":
    {\"@timestamp\":\"2017-11-29T06:44:58.240+00:00\",
    \"@version\":1,
    \"message\":\"Has transfer of $681 completed successfully ? true.\",
    \"logger_name\":\"ofi.paas.ocp.logging.splunk.loggers.Slf4TransferService\",
    \"thread_name\":\"pool-1-thread-1\",
    \"level\":\"INFO\",
    \"level_value\":20000,
    \"transactionOwner\":\"John\",
    \"transactionId\":\"9c2025d7-399d-492c-84f1-3a51111d967d\",
    \"appname\":\"my-java-app\"}
    ,\"stream\":\"stdout\"
    ,\"time\":\"2017-11-29T06:44:58.240849341Z\"}"
|  spath

Please see if wrapped JSON data correction before indexing to Splunk is feasible, if not try out the replace() work around stated above.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

raffinyc
Explorer

Thanks for the pointers, interesting options, will definitely try some of these out.

We determined the problem on our end to be caused by the Docker daemon (dockerd), which by default uses --log-driver=json-file. Not wanting to mess with such a low level config, we left the driver as-is and instead changed our splunk query:

source= | spath input=log

Simple enough! This performs indexing at search time, not extraction time which was our goal, but it still performs well enough for our needs.

Thanks again!

0 Karma

niketn
Legend

@raffinyc, you should convert your own comment as answer and accept the same to mark this question as answered.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

raffinyc
Explorer

ok, will do!

0 Karma

raffinyc
Explorer

To add, what is the splunk standard prefix for log events? My example above uses this:

{"log":

..but nearly all examples I see use this, does it make any difference?

"message": {
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...