Splunk Search

Create Distinct Records from stats

madakkas
Explorer

I have a question where in I have inputs as below in a file f1.csv

JOB NAME    Start_Time  End_Time
Job1    S11 
Job2    S2  
Job3    S3  
Job1    S12 
Job4    S4  
Job1        E11
Job2        E3
Job3        E3
Job1        E12
Job4        E4

and when i run the below command on the data
|inputlookup f1.csv
|stats values(End_Time) as End_Time values(Start_Time) as Start_Time by "JOB NAME"

i get the output as below

JOB NAME    End_Time    Start_Time
Job1    E11 S11
    E12 S12
Job2    E3  S2
Job3    E3  S3
Job4    E4  S4

Whereas i am looking to create an output as below

JOB NAME    Start_Time  End_Time
Job1    S11 E11
Job2    S2  E3
Job3    S3  E3
Job1    S12 E12
Job4    S4  E4
Tags (1)
0 Karma
1 Solution

493669
Super Champion

Hi @madakkas,
try this:

|inputlookup f1.csv|stats values(*) as * by "JOB NAME" |eval combined=mvzip(Start_Time,End_Time)|fields - Start_Time,End_Time|mvexpand combined| makemv combined delim="," | eval Start_Time=mvindex(combined, 0)| eval End_Time=mvindex(combined, 1)|table "JOB NAME" Start_Time End_Time

View solution in original post

acharlieh
Influencer

So one major assumption that I'm making here, is how to pair up Start and End times. My answer assumes that the first Start time goes with the first End time for each job:

|inputlookup f1.csv
|stats list(*_Time) as *_Time by "JOB NAME"
|mvexpand Start_Time
|streamstats count by "JOB NAME"
|eval End_Time=mvindex(End_Time,count-1)
|fields - count

So what we're doing here, is we're collecting all the times preserving order and duplicate values by using list() instead of values() Using mvexpand, We then expand out any duplicate Start_Times making a row for each. We then have to figure out which End_Time goes with each Start_Time, here we use streamstats to count the entries for each job, and then using eval we can pull the Nth End_Time out. (We wouldn't want to use mvexpand on both Start and End times because that would give us the cross product, which doesn't seem to be what you're after.)

For a similar run anywhere example

|makeresults | eval _raw="Job,Start,End
Job1,S11, 
Job2,S2, 
Job3,S3, 
Job1,S12, 
Job4,S4, 
Job1, ,E11
Job2, ,E3
Job3, ,E3
Job1, ,E12
Job4, ,E4" | multikv | fields Job Start End | eval Start=if(trim(Start)=="",null(),trim(Start)),End=if(trim(End)=="",null(),trim(End)) 
| stats list(*) as * by Job | mvexpand Start | streamstats count by Job | eval End=mvindex(End,count-1)
0 Karma

493669
Super Champion

Hi @madakkas,
try this:

|inputlookup f1.csv|stats values(*) as * by "JOB NAME" |eval combined=mvzip(Start_Time,End_Time)|fields - Start_Time,End_Time|mvexpand combined| makemv combined delim="," | eval Start_Time=mvindex(combined, 0)| eval End_Time=mvindex(combined, 1)|table "JOB NAME" Start_Time End_Time

madakkas
Explorer

I have no idea what is happening , but it simply worked as needed... thank You

0 Karma

deepashri_123
Motivator

Hey madakkas,

You just need to change the query as below:
|inputlookup f1.csv
|stats values(End_Time) as End_Time by "JOB NAME",Start_Time
Hope this helps!!!

0 Karma

madakkas
Explorer

Sorry to say, but it did not help Deepashri. It ended up giving JOBNAME and the Start_Time , but the End_Time column went empty as below
JOB NAME Start_Time End_Time
Job1 S11
Job1 S12
Job2 S2

Job3 S3

Job4 S4

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...