Solved: How to keep Distinct fields correlated after mergi...

chrisw3 · ‎07-18-2017

Quick explanation of my Data format:

Sourcetype "A"
Field_ID, Field_Name

Sourcetype "B"
Field_ID, Interesting_FieldA, Interesting_Field_Timestamp

I feel like I'm trying to do something fairly straightforward but I'm missing something obvious. I'm trying to create a table that includes information from both Sourcetype "A" & "B", but that keeps the timestamps associated with unique events.

My search looks something a little like this:
(filtering and some evals)
| fields Field_ID,Interesting_Field,Interesting_Field_Timestamp,Field_Name
| stats values(*) as * by Field_ID

This works perfectly for Names that have singular events. For other events that have duplicates, I wind up with MV fields for Interesting_Field and Interesting_Field_Timestamp.

That's not inherently a problem, but I'd like to be able to separate those events out into their own separate timestamps so I can view them on a timeline. I've tried using mvexpand, but that seems to wind up with both events having both timestamps rather than each event retaining its own timestamp.
E.g. ** | mvexpand Interesting_Field **

i've also tried to tackle this prior to doing the stats command by using fillnull to allow me to use the stats command on multiple fields. Unfortunately this winds up with the Name field never actually appending to the interesting field, making the table less useful.
E.g ** | fillnull Interesting_Field value="" | stats values(*) as * by Interesting_Field,Field_Id **

Am I stuck using a ** | join ** command in order to make this function as I want? I feel like there has to be a better way than that, but I'm running low on ideas.

sideview · ‎07-18-2017

When the problem is that "association between my N fields is being lost in my transforming commands", often the answer is to glue the N things together before the stats, and then unpack them later. This is messy.

| eval Interesting_Field_Foo =Interesting_Field + "::" + Interesting_Field_Timestamp
| fields Field_ID,Interesting_Field_Foo,Field_Name
| stats values(*) as * by Field_ID
| mvexpand Interesting_Field_Foo
| eval Interesting_Field_Foo=split(Interesting_Field_Foo,"::")
| eval Interesting_Field=mvindex(Interesting_Field_Foo,0)
| eval Interesting_Field_Timestamp = mvindex

However if you were actually interested in more than one field value from A, this becomes a lot less viable as a solution. What you'll get in your case is fine I think -- one row per value of Interesting_Field, with Field_Name carried along on all those events. Probably fine. However if sourcetype A had 5 fields, and many events per Field_Id.... this would be pretty horrible - the mvexpand would multiply all those out and make an unusable mess...

As a further comment, this "glue the N things together before the stats and then split them up after the stats" comes up most often when you need to use the chart command with more than one "group by" field. Including this example just cause it might give you more perspective and help you remember the trick later:

eg: if you want to do

| chart count over name group subgroup by type

well, you can't. chart command only allows a single group by field (boo!)

So you have to do this loveliness:

| eval nameGroupSubgroup=name + "---" + group + "---" + subgroup  
| chart count over nameGroupSubgroup by type 
| eval nameGroupSubgroup=split(nameGroupSubgroup,"---") 
| eval name=mvindex(nameGroupSubgroup,0) 
| eval group=mvindex(nameGroupSubgroup,1) 
| eval subgroup=mvindex(nameGroupSubgroup,2) 
| fields - nameGroupSubgroup
| fields name group subgroup *

View solution in original post

sideview · ‎07-18-2017

When the problem is that "association between my N fields is being lost in my transforming commands", often the answer is to glue the N things together before the stats, and then unpack them later. This is messy.

| eval Interesting_Field_Foo =Interesting_Field + "::" + Interesting_Field_Timestamp
| fields Field_ID,Interesting_Field_Foo,Field_Name
| stats values(*) as * by Field_ID
| mvexpand Interesting_Field_Foo
| eval Interesting_Field_Foo=split(Interesting_Field_Foo,"::")
| eval Interesting_Field=mvindex(Interesting_Field_Foo,0)
| eval Interesting_Field_Timestamp = mvindex

However if you were actually interested in more than one field value from A, this becomes a lot less viable as a solution. What you'll get in your case is fine I think -- one row per value of Interesting_Field, with Field_Name carried along on all those events. Probably fine. However if sourcetype A had 5 fields, and many events per Field_Id.... this would be pretty horrible - the mvexpand would multiply all those out and make an unusable mess...

As a further comment, this "glue the N things together before the stats and then split them up after the stats" comes up most often when you need to use the chart command with more than one "group by" field. Including this example just cause it might give you more perspective and help you remember the trick later:

eg: if you want to do

| chart count over name group subgroup by type

well, you can't. chart command only allows a single group by field (boo!)

So you have to do this loveliness:

| eval nameGroupSubgroup=name + "---" + group + "---" + subgroup  
| chart count over nameGroupSubgroup by type 
| eval nameGroupSubgroup=split(nameGroupSubgroup,"---") 
| eval name=mvindex(nameGroupSubgroup,0) 
| eval group=mvindex(nameGroupSubgroup,1) 
| eval subgroup=mvindex(nameGroupSubgroup,2) 
| fields - nameGroupSubgroup
| fields name group subgroup *

chrisw3 · ‎07-20-2017

This is a great answer. I was actually reviewing your presentation from .Conf 2016 looking for something like this.

I do have some other instances (although none handy) where I was trying to do this across multiple fields. As you mentioned, this obviously is hard to scale. Do you have any thoughts or examples you could pass on for how to handle this for multiple fields?

How to keep Distinct fields correlated after merging with Stats command?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!