All Apps and Add-ons

Help parsing JSON embedded in fields from DB Connect input

scottsavareseat
Path Finder

I'm using DB connect 3.2 on Splunk 8.0.1. I have an input that brings in data from a mysql database. It looks like thins:

2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json"}}", f5="casdca", f6="more"

Field f4 contains about 13k of JSON. When viewing the event right now, f1, f2, f3 come out find. Splunk can't see f4 as containing JSON so it isn't parsed. f5 and f6, which you'd think are parsed right are not. They appear as being part of the value of the JSON f4 field.

What I would like is a way to have f5 and f6 properly parsed as fields and f4 to be expanded fully as their own JSON fields so that I can pull out fields as part of the search. Being a huge bit of JSON search time extraction is preferred.

I've tried setting KV_MODE to JSON and I've tried playing with transforms logic:
[forseti_violations_all_fields]
REGEX = , ([a-zA-Z0-9-_]+)="(.*)"(, |$)
FORMAT = $1::$2
But none of it has worked at all.

Any ideas? Thanks!

0 Karma
1 Solution

scottsavareseat
Path Finder

I'm not suggesting this is an answer. However, as a workaround I developed a script that will pull from the mysql database, take the already json fields and convert them to python objects, then upload the event via Splunk HEC. The script itself is about 110 lines including comments and is reusable.

View solution in original post

0 Karma

scottsavareseat
Path Finder

I'm not suggesting this is an answer. However, as a workaround I developed a script that will pull from the mysql database, take the already json fields and convert them to python objects, then upload the event via Splunk HEC. The script itself is about 110 lines including comments and is reusable.

0 Karma

woodcock
Esteemed Legend

Perfect! Click Accept on your answer.

0 Karma

scottsavareseat
Path Finder

I'm wondering if it is easier to tell DB Connect to just write the event as a JSON blob to HEC instead of using field=value format. Is that possible?

0 Karma

woodcock
Esteemed Legend

Your RegEx is wrong; try this:

REGEX = ,\s+([a-zA-Z0-9-_]+)="(.*?)"(?=, |$)

See here:
https://regex101.com/r/bV62Ui/1

0 Karma

scottsavareseat
Path Finder

Thanks for the regex, but it doesn't work in a real example. Looks like the JSON sample above isn't truely indicative of the data. The JSON is much longer and has a lot of quote comma space patterns in it:
2020-03-17 10:29:16.000, f1="10", f2="2020-03-17 10:29:16", f3="something", f4="{"really": ["long"], "and": {"deep": "json", "test": "foo"}}", f5="casdca", f6="more"
Even that is wrong for some reason. In the real data f4 extracts to just '{' and I don't see f5 and f6 getting extracted at all.

I'm wondering if there is a way to tell DB Connect to submit the entire event as JSON instead of field=value.

Thanks, any more ideas?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...