Getting Data In

How to format nested data using key-value structure

adamcohen
New Member

The Splunk best practices document recommends:

Use clear key-value pairs

key1=value1, key2=value2, key3=value3 . . .

This makes sense for simple data that can be represented in key-value format, but what about nested data structures? For example, what's the best way of representing the following log data using key-value format?

{
  "categories": [
    "Restaurants",
    "American (New)",
    "Southern"
  ],
  "attributes": {
    "BusinessParking": {
      "street": false,
      "garage": true
    },
    "WheelchairAccessible": true,
    "GoodForKids": false,
  },
  "stars": 4.5,
  "city": "Las Vegas",
  "name": "Yardbird Southern Table & Bar",
}

I can represent the attributes and top level keys using dotted-notation:

attributes.BusinessParking.street="false",
attributes.BusinessParking.garage"true",
attributes.WheelchairAccessible="true",
attributes.GoodForKids"false",
stars="4.5",
city="Las Vegas",
name="Yardbird Southern Table & Bar",

Although I'm not sure if this is optimal.

However, my main question is: how should I represent the categories array?

I need to be able to perform a search on the above data and return all records that have more than N number of categories, so how should my data be structured in order to facilitate such a query in the most efficient way possible?

The reason I'm asking is because we're currently storing our logs in JSON format, and I can indeed perform the above query using JSON data with spath, but there are people in my organization that believe that spath is very slow and using key-value is much faster, and they want to change our logging format from JSON to key-value. I'd like to be able to compare both log structures, JSON and key-value, to understand which format is more efficient for querying (if, in fact there is any difference at all), and at the moment, I can't even figure out how to best structure the key-value logs to allow me to query array data.

0 Karma

cesarbmx
Engager

@adamcohen - what did you end up doing?
I am in the same situation as you. If Splunk recommends key value pairs (which I also like above json), why doesn't it recommend a way to represent searchable arrays?

0 Karma

starcher
Influencer

If your data is in JSON keep it that way and just put KV_MODE = json on your sourcetype.

0 Karma

adamcohen
New Member

Thanks for the response @starcher, however, I'm not trying to solve this problem for a JSON formatted log - I already know how to do that, and it works well. The problem is how to solve this problem for key-value formatted logs, since my organization wants to have a clear comparison of JSON formatted logs versus key-value. This is why I'm trying to figure out the best way to store a nested data structure in key-value format, so I can attempt to run the same queries against both JSON and key-value formatted data to figure out what the differences are between the two formats, in order to summarise the advantages/disadvantages of both approaches.

For example, say I want to return all restaurants that have more than 15 categories, I can use the following query on JSON formatted data:

source="business.json" | spath categories{} | where mvcount('categories{}') > 15

The above query requires using spath, which can be slow. In order to compare this to key-value, I need to first understand how to store the nested data (including the categories array) in key-value format, so I can then construct a query.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...