Extracting delimited values from a field with a dy...

jpawloski · ‎06-27-2018

I have a field that contains column names delimited by spaces that I need to break out into separate fields for filtering purposes. The values are simply the column names, like shown below:

THIS_COLUMN THAT_COLUMN

I currently have an mvexpand solution in place which works but takes a huge amount of time to process through the duplicate events. I really want to parse the column names out into separate columns in the same event and pass them through a foreach statement, or something similar.

I've seen solutions that would work given a finite amount of column names but not for when the amount of values vary. Is there a structure I can apply to recognize the amount of columns and add them to the event. Here's my current search for reference:

base search | eval UID = _cd + index+splunk_server | eval singleColumns=split(column_name, " ") |mvexpand singleColumns | search NOT [|inputlookup Known_Bad_Columns | rename bad_columns as singleColumns ] | dedup UID | stats count by field1, field2 | sort by count desc

woodcock · ‎04-07-2019

Your search is broken, even as-is. It is doing nothing. Here is why.

This gives you events, each with a unique UID:

base search
| eval UID = _cd + index+splunk_server

This breaks each event, into several events, EACH WITH THE SAME UID AS THE ORIGINAL JOINED/PARENT EVENT:

| eval singleColumns=split(column_name, " ")
|mvexpand singleColumns

This drops SOME of those events, but the ones that are left still share the original UID:

| search NOT [|inputlookup Known_Bad_Columns | rename bad_columns as singleColumns ]

Here is the problem. This throws away all events for each value of UID except for the one on top. This means that the very first field in column_name is the only event that is left. This makes no sense to me and probably is not what you are intending to do. You need to remove all the SPL after the following line and take a good hard look at what the following command is doing. It is not what you think:

| dedup UID

I cannot comment on this but presumably field1 and field2 are 2 of the columns in column_name, right?

| stats count by field1, field2

You mus ALWAYS use a number with sort:

| sort 0 count desc

dmarling · ‎04-05-2019

Can you please provide a couple of test examples of how this data appears in the raw events? This will make it easier for us to see the specific issue you are running into.

If this comment/answer was helpful, please up vote it. Thank you.

Extracting delimited values from a field with a dynamic length?

Welcome to the Splunk Community!

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Adoption of RUM and APM at Splunk