Solved: Create Distinct Records from stats

madakkas · ‎02-26-2018

I have a question where in I have inputs as below in a file f1.csv

JOB NAME    Start_Time  End_Time
Job1    S11 
Job2    S2  
Job3    S3  
Job1    S12 
Job4    S4  
Job1        E11
Job2        E3
Job3        E3
Job1        E12
Job4        E4

and when i run the below command on the data
|inputlookup f1.csv
|stats values(End_Time) as End_Time values(Start_Time) as Start_Time by "JOB NAME"

i get the output as below

JOB NAME    End_Time    Start_Time
Job1    E11 S11
    E12 S12
Job2    E3  S2
Job3    E3  S3
Job4    E4  S4

Whereas i am looking to create an output as below

JOB NAME    Start_Time  End_Time
Job1    S11 E11
Job2    S2  E3
Job3    S3  E3
Job1    S12 E12
Job4    S4  E4

493669 · ‎02-26-2018

Hi @madakkas,
try this:

|inputlookup f1.csv|stats values(*) as * by "JOB NAME" |eval combined=mvzip(Start_Time,End_Time)|fields - Start_Time,End_Time|mvexpand combined| makemv combined delim="," | eval Start_Time=mvindex(combined, 0)| eval End_Time=mvindex(combined, 1)|table "JOB NAME" Start_Time End_Time

View solution in original post

acharlieh · ‎02-26-2018

So one major assumption that I'm making here, is how to pair up Start and End times. My answer assumes that the first Start time goes with the first End time for each job:

|inputlookup f1.csv
|stats list(*_Time) as *_Time by "JOB NAME"
|mvexpand Start_Time
|streamstats count by "JOB NAME"
|eval End_Time=mvindex(End_Time,count-1)
|fields - count

So what we're doing here, is we're collecting all the times preserving order and duplicate values by using list() instead of values() Using mvexpand, We then expand out any duplicate Start_Times making a row for each. We then have to figure out which End_Time goes with each Start_Time, here we use streamstats to count the entries for each job, and then using eval we can pull the Nth End_Time out. (We wouldn't want to use mvexpand on both Start and End times because that would give us the cross product, which doesn't seem to be what you're after.)

For a similar run anywhere example

|makeresults | eval _raw="Job,Start,End
Job1,S11, 
Job2,S2, 
Job3,S3, 
Job1,S12, 
Job4,S4, 
Job1, ,E11
Job2, ,E3
Job3, ,E3
Job1, ,E12
Job4, ,E4" | multikv | fields Job Start End | eval Start=if(trim(Start)=="",null(),trim(Start)),End=if(trim(End)=="",null(),trim(End)) 
| stats list(*) as * by Job | mvexpand Start | streamstats count by Job | eval End=mvindex(End,count-1)

493669 · ‎02-26-2018

Hi @madakkas,
try this:

|inputlookup f1.csv|stats values(*) as * by "JOB NAME" |eval combined=mvzip(Start_Time,End_Time)|fields - Start_Time,End_Time|mvexpand combined| makemv combined delim="," | eval Start_Time=mvindex(combined, 0)| eval End_Time=mvindex(combined, 1)|table "JOB NAME" Start_Time End_Time

madakkas · ‎02-26-2018

I have no idea what is happening , but it simply worked as needed... thank You

deepashri_123 · ‎02-26-2018

Hey madakkas,

You just need to change the query as below:
|inputlookup f1.csv
|stats values(End_Time) as End_Time by "JOB NAME",Start_Time
Hope this helps!!!

madakkas · ‎02-26-2018

Sorry to say, but it did not help Deepashri. It ended up giving JOBNAME and the Start_Time , but the End_Time column went empty as below
JOB NAME Start_Time End_Time
Job1 S11
Job1 S12
Job2 S2

Job3 S3

Job4 S4

Create Distinct Records from stats

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!