Solved: How can I condense this data into distinct session...

nick405060 · ‎01-18-2019

Data:

user      Source_Network_Address      session_start         session_end
bob       10.0.0.1                    01-18-2019 00:53      01-18-2019 00:59
bob       10.0.0.1                    01-18-2019 01:23    
bob       10.0.0.1                                          01-18-2019 01:29
bob       10.0.0.1                    01-18-2019 02:50    
bob       10.0.0.1                                          01-18-2019 02:52
joe       10.0.0.201                  01-18-2019 05:57      01-18-2019 05:57
joe       10.0.0.955                  01-18-2019 05:58    
joe       10.0.0.955                                        01-18-2019 06:17

Desired data:

user      Source_Network_Address      session_start         session_end
bob       10.0.0.1                    01-18-2019 00:53      01-18-2019 00:59
bob       10.0.0.1                    01-18-2019 01:23      01-18-2019 01:29
bob       10.0.0.1                    01-18-2019 02:50      01-18-2019 02:52                        
joe       10.0.0.201                  01-18-2019 05:57      01-18-2019 05:57
joe       10.0.0.955                  01-18-2019 05:58      01-18-2019 06:17

In Python, this would be super easy as shown below. But there HAS to be a super quick way to do it in a Splunk search, right?

data = [[user,Source_Network_Address,session_start,session_end],[bob,10.0.0.1,"01-18-2019 00:53","01-18-2019 00:53"],...]
result = []
new_row = []
for row in data:
    if new_row == [] and row[3] == "":
        new_row = row
    elif new_row == [] and row[3] != "":
        result.append(row)
    else:
        new_row[3] = row[3]
        result.append(row)
        new_row = []

martin_mueller · ‎01-18-2019

Using the same approach as your python and assuming it's sorted correctly, you could use streamstats to provide the previous row's start time to the next row:

... | streamstats current=f window=1 global=f last(session_start) as last_session_start by user Source_Network_Address
| where isnotnull(session_end)
| eval session_start = coalesce(session_start, last_session_start)

View solution in original post

nick405060 · ‎01-18-2019

The only way I can think to do it (not elegant, whatsoever) is:

... | search NOT (session_start!="" AND session_end!="") | table session_end | streamstats count as row_num | outputcsv "temp.csv" | table user Source_Network_Address | streamstats count as row_num | join type=left row_num [| inputcsv "temp.csv"]

And then somehow join it with (session_start!="" AND session_end!="") in the original data. Any better suggestions?

martin_mueller · ‎01-18-2019

Using the same approach as your python and assuming it's sorted correctly, you could use streamstats to provide the previous row's start time to the next row:

... | streamstats current=f window=1 global=f last(session_start) as last_session_start by user Source_Network_Address
| where isnotnull(session_end)
| eval session_start = coalesce(session_start, last_session_start)

nick405060 · ‎01-18-2019

I literally just copied and pasted this and it worked perfectly. Thank you so much!

How can I condense this data into distinct session times? [Easy-ish search question]

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life