How to add multi-value key-value pairs to current ...

cdieringerwm · ‎07-13-2023

Greetings.

Suppose I have an event schema of just a URL, where the query section of the URL may change:

```ndjson

{ url: "/?a=b&c=d" }

{ url: "/?arbitraryKey=arbitraryValue" }

```

The KV pairs are arbitrary.

From sibling thread, I have already extracted KEYS, VALUES from params into multi-value fields. For example:

```

// pseudo-code of Splunk variables, first event

keys = MultiValue [ a, c ]

values = MultiValue [ b, d ]

// pseudo-code of Splunk variables, second event

keys = MultiValue [ arbitraryKey ]

values = MultiValue [ arbitraryValue ]

```

I want each KV pair added to the associated event, such that, at the end of my query, I can do interesting states. For example, i may want to `| table a, c, arbitraryKey` and see two records, corresponding to the input events above:

```txt

a, c, arbitraryKey

--------

b, d, null

null, null, arbitraryValue

```

Simply put, I want to derive a suite of KV pairs from an event, then merge them back into the event.

`mvexpand` creates new rows, so that's not what I want.

What other options do I have?

yuanliu · ‎07-13-2023

The way you are pursuing this is quite difficult in SPL. However, SPL can achieve what you ask if certain conditions apply. It kind of depends on your data.

If you know all the query variables and both number of variables and number of events are small, you can simply do

| fillnull value="null"
| eventstats list(a) as a list(c) as c list(arbitraryKey) as arbitraryKey

If enumerating query keys is exhausting but number of events is still < 100, you can use wildcard

| fillnull value="null"
| eventstats list(*) as * by a_unique_field

where a_unique_field is any field that is unique to each event, or a combination of fields that is unique (primary key), and field name in groupby are not a URL query variable.

If you have more than 100 events, the ask is perhaps unreasonable.

cdieringerwm · ‎07-14-2023

Hi yuanliu,

Thanks for the tips. If you wouldn't mind assisting a bit further, that'd be amazing.

> The way you are pursuing this is quite difficult in SPL

I'm happy to do something else. More or less I'm just trying to map one event into another event, deriving the second from the first. I tend to think Splunk is great at map-reduction problems... but I agree, I'm not sure why this one is giving me a headache.

> If you know all the query variables ...

I could enumerate a dozen or so of core interest to satisfy my needs.

> If you have more than 100 events

I have about 25 events a second :), so generally much larger than 100.

| fillnull value="null"
| eventstats list(a) as a list(c) as c list(arbitraryKey) as arbitraryKey

In my example above, I have multi-value fields. I don't see multi-value variables being used in your example. Did I miss something?

Much appreciated.

yuanliu · ‎07-14-2023

In my example above, I have multi-value fields. I don't see multi-value variables being used in your example. Did I miss something?

Your goal is to have fields 'a', 'c', 'arbitraryKey' (which preexists from Splunk) to be presented with the result you wanted. Those multi-valued field 'keys' and 'values' are not useful; in fact, they only get in the way because of how SPL works.

Now, the central problem is number of events. Here, you need to clarify the requirement: Do you want to retain order in each field 'a', 'c', 'arbitraryKey', etc.? If not, simply replace list with values. There is no limit to number of values. But each field will be ordered according to their own ASCII order, therefore losing all correlation. The number of values in each field will also likely be different.

If order is important, technically you can still get what you wanted. I will demonstrate one technique. But I highly doubt if it's worth it if you already worry about the cost of mvexpand.

First the technique. This illustration uses your knowledge about the dozen or so fields of interest.

| fillnull value="null"
| eval of_interest = a . ":" . c . ":" . arbitraryKey . ":" . other1 . ":" . other2  ``` and so on ```
| eval interest_order = mvappend("a", "c", "arbitraryKey", "other1", "other2" ``` and so on ```)
| eventstats values(of_interest) as of_interest
| foreach a c arbitraryKey other1 other2 ``` and so on ```
    [eval <<FIELD>> = mvmap(of_interest, mvindex(split(of_interest, mvfind(interest_order, "<<FIELD>>")))) ]

Here, the mvmap and split are repeated n times in each row. Is this really less expensive than mvexpand? Maybe you should think through the actual "interesting statistics" and decide if adding a dozen fields with thousands of values to each of thousands of events is truly that useful. And even if it is, maybe of_interest as a whole is better saved into a lookup file instead of as multivalue fields in events.

How to add multi-value key-value pairs to current event?

join

other

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases