Splunk Search

how to get correlated data with stats

brettcave
Builder

Is there a way to create a relationship between 2 fields, using a row concept, for use with stats or chart?

A report I am building uses stats to get latest values per user from the raw data. In the data, a UserID is unique to each user, but is only present in specific events ("auth" events). All events are tagged with a VisitorID, which can change (i.e. each user can have multiple visitorId's).

Here's an example to show how I have built my report

eventtype="SomeEvent" OR eventtype="OtherEvent" OR eventtype="Authentication"
| eval SomeTime=if(eventtype="SomeEvent",_time,NULL)
| eval OtherTime=if(eventtype="OtherTime",_time,NULL)
| stats latest(UserID) latest(SomeValue) as SomeValue latest(SomeTime) as SomeTime latest(OtherTime) as OtherTime latest(OtherValue) as OtherValue by VisitorID

From here, I get a list of UserID's, *Values and *Times by VisitorID:

VisitorID UserID  SomeValue   SomeTime  OtherValue  OtherTime
a         001                           1           1012
b         001     3           987       3           1007
c         001     5           1008      

What I am trying to do here for my report is to find the most recent *Value by User (the stats command above looses _time field, so I propogate it with eval via EVENTTime)

... | stats SomeValue[max(SomeTime)] as SomeValue OtherValue[max(OtherTime)] by UseerID

This would give me the correct data: UserID 001, SomeValue 5, OtherValue 3

is there a way to do this with chart or stats? Some sort of compound use of stats / chart functions?

Here's a set of example data that could produce the data above by VisitorID:

visitor: b; userid: 001; type: authentication
visitor: b; type: someEvent; someValue: 3
visitor: b; type: otherEvent; otherValue: 1
visitor: c; userid: 001; type: authentication
visitor: c; type: someEvent; someValue: 5
visitor: a; userid: 001; type: authentication
visitor: a; type: otherEvent; otherValue: 3    

thanks
brett

0 Karma
1 Solution

brettcave
Builder

In order to get this to work, and get the most recent event per userId, regardless of visitor ID, I used a concatenation with the time, combined with a sort, and then split with mvindex later on. Here's an example:

eventtype="SomeEvent" OR eventtype="OtherEvent" OR eventtype="Authentication"
| eval SomeDetail=if(eventtype="SomeEvent",_time." ".SomeValue,NULL)
| eval OtherDetail=if(eventtype="OtherEvent",_time." ".OtherValue,NULL)
| stats latest(UserID) as UserID latest(SomeDetail) as SomeDetail latest(OtherDetail) as OtherDetail by VisitorID
| stats max(SomeDetail) as SomeDetail max(OtherDetail) as OtherDetail by UserID
| eval SomeValue=mvindex(split(SomeDetail," "),1)
| eval OtherValue=mvindex(split(OtherDetail," "),1)

This way, by selecting the max per UserID i get the latest *Detail field, regardless of what the visitorID was (the max has the largest _time).

View solution in original post

0 Karma

brettcave
Builder

In order to get this to work, and get the most recent event per userId, regardless of visitor ID, I used a concatenation with the time, combined with a sort, and then split with mvindex later on. Here's an example:

eventtype="SomeEvent" OR eventtype="OtherEvent" OR eventtype="Authentication"
| eval SomeDetail=if(eventtype="SomeEvent",_time." ".SomeValue,NULL)
| eval OtherDetail=if(eventtype="OtherEvent",_time." ".OtherValue,NULL)
| stats latest(UserID) as UserID latest(SomeDetail) as SomeDetail latest(OtherDetail) as OtherDetail by VisitorID
| stats max(SomeDetail) as SomeDetail max(OtherDetail) as OtherDetail by UserID
| eval SomeValue=mvindex(split(SomeDetail," "),1)
| eval OtherValue=mvindex(split(OtherDetail," "),1)

This way, by selecting the max per UserID i get the latest *Detail field, regardless of what the visitorID was (the max has the largest _time).

0 Karma

jonuwz
Influencer

So the problem you really need to solve is how to get the UserID in each of the events based on VisitorID. I think the rest is trivial after that ?

If VisitorID is also unique,

... | eventstats first(UserId) as UserId by VisitorID  

Will populate the correct UserId in each event, and then you can use UserId in your 'by' clauses.

brettcave
Builder

sorry, to be more precise, I was getting the UserID into each event. In my example above, i was doing a latest(UserID) in the first "stats" command (my actual search in splunk was latest(UserID) as UserID, which is exactly as you suggested 🙂

0 Karma

brettcave
Builder

well not quite - i was doing a first, or doing a last with a sort between the 2 stats, but this could give incorrect "last" or "first" values in teh 2nd stats not necessarily being the most recent. I'll post the way I got it to work below...

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...