Monitoring Splunk

How to properly parse nested json logs during index time

ADRIANODL
Explorer

Hi folks,
I've struggled for a couple of days parsing a file that looks like this:

{    "status": {        "timestamp": "2018-10-10T23:40:26.101Z",        "error_code": 0,        "error_message": null,        "elapsed": 14,        "credit_count": 1    },    "data": [        {            "id": 1,            "name": "Bitcoin",            "symbol": "BTC",            "slug": "bitcoin",            "circulating_supply": 17315325,            "total_supply": 17315325,            "max_supply": 21000000,            "date_added": "2013-04-28T00:00:00.000Z",            "num_market_pairs": 6324,            "cmc_rank": 1,            "last_updated": "2018-10-10T23:39:30.000Z",            "quote": {                "USD": {                    "price": 6593.07839577,                    "volume_24h": 3781676977.17942,                    "percent_change_1h": 0.299567,                    "percent_change_24h": -0.633751,                    "percent_change_7d": 0.95377,                    "market_cap": 114161295173.23618,                    "last_updated": "2018-10-10T23:39:30.000Z"                }            }        },        {            "id": 1027,            "name": "Ethereum",            "symbol": "ETH",            "slug": "ethereum",            "circulating_supply": 102492584.749,            "total_supply": 102492584.749,            "max_supply": null,            "date_added": "2015-08-07T00:00:00.000Z",            "num_market_pairs": 4471,            "cmc_rank": 2,            "last_updated": "2018-10-10T23:39:42.000Z",            "quote": {                "USD": {                    "price": 225.917892111,                    "volume_24h": 1382931374.09404,                    "percent_change_1h": 0.569086,                    "percent_change_24h": -0.704234,                    "percent_change_7d": 1.85499,                    "market_cap": 23154908703.502106,                    "last_updated": "2018-10-10T23:39:42.000Z"                }            }        },        {            "id": 52,            "name": "XRP",            "symbol": "XRP",            "slug": "ripple",            "circulating_supply": 39997634397,            "total_supply": 99991817275,            "max_supply": 100000000000,            "date_added": "2013-08-04T00:00:00.000Z",            "num_market_pairs": 230,            "cmc_rank": 3,            "last_updated": "2018-10-10T23:40:09.000Z",            "quote": {                "USD": {                    "price": 0.461478997819,                    "volume_24h": 411665959.164198,                    "percent_change_1h": 0.445149,                    "percent_change_24h": -3.5819,                    "percent_change_7d": -13.0624,                    "market_cap": 18458068236.65832,                    "last_updated": "2018-10-10T23:40:09.000Z"                }            }        },        {            "id": 1831,            "name": "Bitcoin Cash",            "symbol": "BCH",            "slug": "bitcoin-cash",            "circulating_supply": 17395525,            "total_supply": 17395525,            "max_supply": 21000000,            "date_added": "2017-07-23T00:00:00.000Z",            "num_market_pairs": 342,            "cmc_rank": 4,            "last_updated": "2018-10-10T23:39:36.000Z",            "quote": {                "USD": {                    "price": 512.630430925,                    "volume_24h": 369198906.22134,                    "percent_change_1h": 0.365485,                    "percent_change_24h": -0.745641,                    "percent_change_7d": -1.26889,                    "market_cap": 8917475476.91661,                    "last_updated": "2018-10-10T23:39:36.000Z"                }            }        },        {            "id": 1765,            "name": "EOS",            "symbol": "EOS",            "slug": "eos",            "circulating_supply": 906245117.6,            "total_supply": 1006245119.9339,            "max_supply": null,            "date_added": "2017-07-01T00:00:00.000Z",            "num_market_pairs": 174,            "cmc_rank": 5,            "last_updated": "2018-10-10T23:39:36.000Z",            "quote": {                "USD": {                    "price": 5.88204493255,                    "volume_24h": 532584590.023657,                    "percent_change_1h": 0.326144,                    "percent_change_24h": -0.205986,                    "percent_change_7d": 4.79873,                    "market_cap": 5330574501.627259,                    "last_updated": "2018-10-10T23:39:36.000Z"                }            }        },        {            "id": 512,            "name": "Stellar",            "symbol": "XLM",            "slug": "stellar",            "circulating_supply": 18890616791.6665,            "total_supply": 104363617748.491,            "max_supply": null,            "date_added": "2014-08-05T00:00:00.000Z",            "num_market_pairs": 120,            "cmc_rank": 6,            "last_updated": "2018-10-10T23:39:19.000Z",            "quote": {                "USD": {                    "price": 0.241243489647,                    "volume_24h": 40002371.0719364,                    "percent_change_1h": 0.417773,                    "percent_change_24h": -1.36287,                    "percent_change_7d": -0.740479,                    "market_cap": 4557238316.405842,                    "last_updated": "2018-10-10T23:39:19.000Z"                }            }        },

What I need to do is:
-extract the timestamp, which I'm doing successfully
- extract each coin and all the related information, including the nested "quote" as single event (the actual coin info starts after each "id", and finishes on the }, )
-all info before "data" can be suppressed

My current sourcetype looks like this:
[lalaland]
DATETIME_CONFIG =
KV_MODE = json
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3NT
TIME_PREFIX = \"timestamp\":
TZ = Australia/Sydney
category = Custom
pulldown_type = 1
BREAK_ONLY_BEFORE = \s*{\s*\"id\":
disabled = false
MAX_EVENTS = 10000

What am I doing wrong?

Tks!

0 Karma
1 Solution

FrankVl
Ultra Champion

I prefer to use the more explicit LINE_BREAKER setting and set SHOULD_LINEMERGE to false.

Also: Taking the timestamp from the 'header' and breaking the rest into separate events may not really be an option. Splunk expects a timestamp per (broken) event, not 1 timestamp and then a bunch of timestampless events. It might be possible to get that to work, but not sure if and how.

Can you not use the last update time in each of the coin sections as the timestamp? Then you could use config somewhat like this:

[lalaland]
KV_MODE = json
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3NT
TIME_PREFIX = \"last_updated\"\:
TZ = Australia/Sydney
category = Custom
pulldown_type = 1
LINE_BREAKER = (?:(^[^\[]+\[\s+)\{\s+\"id\"\:|(,\s+)\{\s+\"id\"\:)
disabled = false
MAX_EVENTS = 10000

regex101 link for the linebreaker regex: https://regex101.com/r/UfbLif/1

View solution in original post

FrankVl
Ultra Champion

I prefer to use the more explicit LINE_BREAKER setting and set SHOULD_LINEMERGE to false.

Also: Taking the timestamp from the 'header' and breaking the rest into separate events may not really be an option. Splunk expects a timestamp per (broken) event, not 1 timestamp and then a bunch of timestampless events. It might be possible to get that to work, but not sure if and how.

Can you not use the last update time in each of the coin sections as the timestamp? Then you could use config somewhat like this:

[lalaland]
KV_MODE = json
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3NT
TIME_PREFIX = \"last_updated\"\:
TZ = Australia/Sydney
category = Custom
pulldown_type = 1
LINE_BREAKER = (?:(^[^\[]+\[\s+)\{\s+\"id\"\:|(,\s+)\{\s+\"id\"\:)
disabled = false
MAX_EVENTS = 10000

regex101 link for the linebreaker regex: https://regex101.com/r/UfbLif/1

harsmarvania57
Ultra Champion

Perfect one, I didn't look at question properly to break events for every coin. Still I am getting timestamp recognition error on 7.2 in my lab (may be due to %3NT)

0 Karma

FrankVl
Ultra Champion

Oh, yes, I didn't double check that part. That should be a Z instead of a T of course. Or even a %Z if you actually want to capture that Z as the indicator for Zulu (UTC) timezone.

Bit strange by they way to set TZ = Australia/Sydney, while the logs mention that Z, meaning Zulu (UTC) timezone...

0 Karma

ADRIANODL
Explorer

Hi Frank,
This is because the source of that info is based in the US, whilst we're based in AU.
:)

Thanks for the response, guys! Seems to be working fine!

0 Karma

FrankVl
Ultra Champion

How does your location matter for the TZ you set? If the information is logged in UTC timezone (which the Z indicates), you should tell splunk to interpret it as UTC, not as Australia time.

If you want things displayed in australia time, you do that with your user's timezone settings in splunk web, not with the props.conf. Telling splunk to index UTC logs as Australia/Sidney, will cause splunk to put skewed values into _time.

0 Karma

harsmarvania57
Ultra Champion

Hi @ADRIANODL,

Have you tried below configuration on Universal Forwarder ?

props.conf

[yourSourcetype]
INDEXED_EXTRACTIONS = json
0 Karma

FrankVl
Ultra Champion

That (on its own) is not going to help him split this data into separate events per coin.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...