Getting Data In

How to parse JSON within Docker JSON?

maggietempleton
Engager

Hello,

I've been trying to parse logs from Docker and used this Splunk answer (https://answers.splunk.com/answers/611715/docker-logs-produced-in-raw.html) to extract the underlying logs from the Docker JSON.

The underlying logs are also in JSON, so I'm trying to get Splunk to recognize the opening "{" as the start of the event. However, I'm finding that some sources are still dividing each line of the log into a separate event, while some sources are creating a single event with multiple JSON blobs.

Here is my props.conf:

[source::/var/log/containers/*]
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
LINE_BREAKER = ([\n\r]+){"log":"{\n   # setting line break as opening "{" in underlying JSON
CHARSET = UTF-8
disabled = false

[container_json]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC
KV_MODE = json

This is the log sent from Docker:

{"log":"{\n","stream":"stdout","time":"2018-03-06T18:56:08.648972915Z"}
{"log":"  \"time\": \"2018-03-06 18:56:08.648636Z\",\n","stream":"stdout","time":"2018-03-06T18:56:08.649029831Z"}
{"log":"  \"nothing_to_update\": true,\n","stream":"stdout","time":"2018-03-06T18:56:08.64903929Z"}
{"log":"  \"events\": [\n","stream":"stdout","time":"2018-03-06T18:56:08.649045009Z"}
{"log":"\n","stream":"stdout","time":"2018-03-06T18:56:08.649050131Z"}
{"log":"  ]\n","stream":"stdout","time":"2018-03-06T18:56:08.649054914Z"}
{"log":"}\n","stream":"stdout","time":"2018-03-06T18:56:08.649059571Z"}

This is the extracted source in Splunk, but each line is showing up as individual events:

{
  "time": "2018-03-06 18:56:08.648636Z",
  "nothing_to_update": true,
  "events": [
  ]
}

I have other source files that seem to be working, but they are concatenating several JSON logs together. This source file shows up as one single event:

{
  "time": "2018-03-06 18:56:18.507756Z",
  "events": [
    "No emails to send"
  ]
}
{
  "time": "2018-03-06 18:56:18.514313Z",
  "events": [
    "No emails to send"
  ]
}

I've tried many different props.conf configurations, and this is the closest I've gotten to parsing the JSON properly. The extracted source for both examples is valid JSON, so I'm not sure why some source files are divided into line-by-line events but others are combining multiple JSON events into one.

Any help would be greatly appreciated!

1 Solution

Lowell
Super Champion

Looks like you were going in the right direction with your props settings. Try changing your LINE_BREAKER to this:

 LINE_BREAKER = ([\n\r]+)\s*{"log":"{\\n

I'm not sure if you literally have a comment on the same line, but move it to the line before or after. Besides that, I: (1) Added escaping for the literal \n in your data, and (2) allow spaces before the starting {; this could simple be a copy-n-paste error when posting to this page, but shouldn't hurt either way.

If you want to use the original timestamps, instead of the ones from the wrapper process, then you could use something like this: (I used this combo on the test data provided.)

[your_sourcetype]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
LINE_BREAKER=([\n\r]+)\s*{"log":"{\\n
SEDCMD-1=s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2=s/\\"/"/g
TIME_PREFIX=\\"time\\": \\"
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%6N%Z
KV_MODE=json
TRUNCATE=150000
TZ=UTC

Note that TIME_PREFIX expected to see a literal \", because time processing happens before SEDCMD* take effect.

View solution in original post

0 Karma

Lowell
Super Champion

Looks like you were going in the right direction with your props settings. Try changing your LINE_BREAKER to this:

 LINE_BREAKER = ([\n\r]+)\s*{"log":"{\\n

I'm not sure if you literally have a comment on the same line, but move it to the line before or after. Besides that, I: (1) Added escaping for the literal \n in your data, and (2) allow spaces before the starting {; this could simple be a copy-n-paste error when posting to this page, but shouldn't hurt either way.

If you want to use the original timestamps, instead of the ones from the wrapper process, then you could use something like this: (I used this combo on the test data provided.)

[your_sourcetype]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
LINE_BREAKER=([\n\r]+)\s*{"log":"{\\n
SEDCMD-1=s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2=s/\\"/"/g
TIME_PREFIX=\\"time\\": \\"
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%6N%Z
KV_MODE=json
TRUNCATE=150000
TZ=UTC

Note that TIME_PREFIX expected to see a literal \", because time processing happens before SEDCMD* take effect.

0 Karma

chandrasekharko
Path Finder

I am pulling the logs from Universal Forwarder. Is this to be changed on the UForwarder?

0 Karma

mattymo
Splunk Employee
Splunk Employee

it would need to be on any Heavy Forwarder/Indexers, depending on your environment.

- MattyMo
0 Karma

outcoldman
Communicator

If you are willing to try alternative ways to send the logs to Splunk, you can take a look at our solution for monitoring docker ( https://www.outcoldsolutions.com/ ). Our solution provides advanced configuration for joining log lines into one, base on the pattern, how would you expect your line to look like.
If you will take a look at our example how to run our collector https://www.outcoldsolutions.com/docs/monitoring-docker/ , the only two lines you will need to add are

 --env "COLLECTOR__SPLUNK_JSON1=pipe.join::json__patternRegex=^{" \
 --env "COLLECTOR__SPLUNK_JSON2=pipe.join::json__matchRegex.docker_container_image=^ubuntu:14\.04$" \

which matches to our configuration ( https://www.outcoldsolutions.com/docs/monitoring-docker/configuration/ ), if you want to configure it with the file, and not using the environment variables

[pipe.join::json]
#Set match pattern for the fields
matchRegex.docker_container_image = ^ubuntu:14\.04$
# All events start from '[<digits>'
patternRegex = ^{

That will tell our collector that all messages you expect to start from { for containers running from image matches regex ^ubuntu:14\.04$, the full command to run

docker run -d \
    --name collectorfordocker \
    --volume /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro \
    --volume /proc:/rootfs/proc:ro \
    --volume /var/log:/rootfs/var/log:ro \
    --volume /var/lib/docker/containers/:/var/lib/docker/containers/:ro \
    --volume /var/run/docker.sock:/var/run/docker.sock:ro \
    --volume collector_data:/data/ \
    --cpus=1 \
    --cpu-shares=102 \
    --memory=256M \
    --restart=always \
    --env "COLLECTOR__SPLUNK_URL=output.splunk__url=https://input.splunk.outcold.net/services/collector/event/1.0" \
    --env "COLLECTOR__SPLUNK_TOKEN=output.splunk__token=670DD88D-AFB5-4DCE-B0C5-F7AD0A7A2FB8"  \
    --env "COLLECTOR__EULA=general__acceptEULA=true" \
    --env "COLLECTOR__SPLUNK_JSON1=pipe.join::json__patternRegex=^{" \
    --env "COLLECTOR__SPLUNK_JSON2=pipe.join::json__matchRegex.docker_container_image=^ubuntu:14\.04$" \
    --privileged \
    outcoldsolutions/collectorfordocker:3.0.86.180207

After that if I will try to do

docker run ubuntu:14.04 bash -c "sleep 5 && echo '{
    \"a\": \"b\"
}
'"

That will result json log lines as

{"log":"{\n","stream":"stdout","time":"2018-03-07T05:49:45.33335549Z"}
{"log":" \"a\": \"b\"\n","stream":"stdout","time":"2018-03-07T05:49:45.333434251Z"}
{"log":" }\n","stream":"stdout","time":"2018-03-07T05:49:45.333448344Z"}
{"log":"\n","stream":"stdout","time":"2018-03-07T05:49:45.333459838Z"}

And with our solution they will be delivered as one event
alt text

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...