Splunk Search

How to find the time period of successive table lines?

ctaf
Contributor

Hello,

I have a base search which output me something like this:

_time            src_host             src_ip
06/19/16      A                         10.10.10.10
06/20/16      A                         9.9.9.9
06/21/16      A                         9.9.9.9
06/22/16      A                         10.10.10.10
06/24/16      A                         10.10.10.10

And I'd like to have a summary of this data because I have hundred of lines.
Is it possible to get this output:

duration                                src_host                        src_ip
06/19/16,06/22/16-06/24/16              A                           10.10.10.10
06/20/16-06/21/16                       A                           9.9.9.9

Edit: Please note that successive values are surrounding with a dash "-" and non-successive values are separated by a comma ",".

Thank you !

0 Karma

javiergn
Super Champion

Try this:

your base search
| stats values(_time) as duration by src_host, src_ip

If you still want duration values separated by commas then you can do it this way:

your base search
| stats values(_time) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ",")

Hope that helps,
J

EDIT (other options as suggested below):

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")

your base search
 | eval _time = strptime(_time, "%m/%d/%y")
 | sort limit=0 _time, src_host, src_ip
 | streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
 | eval dur = round(dur/86400)
 | eval start_time = if(dur>1, _time, start_time)
 | streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
 | eventstats last(_time) as end_time by start_time, src_host, src_ip
 | eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
 | eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
 | eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
 | stats values(duration) as duration by src_host, src_ip
 | eval duration = mvjoin(duration, ", ")
 | sort limit=0 src_host, src_ip
0 Karma

ctaf
Contributor

thank you for your answer javiergn. But with your solution, I am grouping all the _time values on a single row without order. As you can see in my example, I am surrounding successif values with "-" and the rest is separated by ",". It gives me only the period where the IP was actually leased to the host.

0 Karma

javiergn
Super Champion

Hmm, that's a bit more complicated but doable I think.

See if the following help:

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
0 Karma

ctaf
Contributor

Yes it is indeed more complicated. I am not sure to understand how streamstats works but unfortunately, it is not summed up enough, because the ouput is :

 02/16/16, 02/16/16, 02/16/16 - 02/17/16, 02/17/16 - 02/18/16.....

Following days are not summed up.

0 Karma

javiergn
Super Champion

What do you mean?

This is the csv I'm using to replicate your use case:

_time,src_host, src_ip
06/19/16,A,10.10.10.10
06/20/16,A,9.9.9.9
06/21/16,A,9.9.9.9
06/22/16,A,10.10.10.10
06/24/16,A,10.10.10.10

This is the query:

| inputcsv mycsv.csv
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 first(_time) as start_time by src_host, src_ip
| sort limit=0 - _time, + src_host, + src_ip
| streamstats window=2 first(_time) as end_time by src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| sort limit=0 _time, src_host, src_ip
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip

And this is the output:

src_host, src_ip, duration
A, 9.9.9.9, 06/20/16 - 06/21/16
A, 10.10.10.10, 06/19/16, 06/22/16 - 06/24/16 

Isn't that what you are looking for according to your sample?

0 Karma

ctaf
Contributor

Actually, in real, the data contains more lines. Like this:

_time,src_host, src_ip
 06/19/16,A,10.10.10.10
 06/20/16,A,9.9.9.9
 06/21/16,A,9.9.9.9
 06/22/16,A,10.10.10.10
 06/24/16,A,10.10.10.10
 06/25/16,A,10.10.10.10
 06/26/16,A,10.10.10.10
 06/28/16,A,10.10.10.10
 06/29/16,A,10.10.10.10
0 Karma

javiergn
Super Champion

so what would you expect out of this:

  06/22/16,A,10.10.10.10
  06/24/16,A,10.10.10.10
  06/25/16,A,10.10.10.10
  06/26/16,A,10.10.10.10
  06/28/16,A,10.10.10.10
  06/29/16,A,10.10.10.10

a) 06/22/16 - 06/29/16,A,10.10.10.10
b) 06/22/16, 06/24/16 - 06/29/16,A,10.10.10.10
?

0 Karma

ctaf
Contributor

c)

06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16         A           10.10.10.10
0 Karma

javiergn
Super Champion

OK, one more attempt:

your base search
| eval _time = strptime(_time, "%m/%d/%y")
| sort limit=0 _time, src_host, src_ip
| streamstats window=2 reset_on_change=t first(_time) as start_time, range(_time) as dur by src_host, src_ip
| eval dur = round(dur/86400)
| eval start_time = if(dur>1, _time, start_time)
| streamstats reset_on_change=t first(start_time) as start_time by dur, src_host, src_ip
| eventstats last(_time) as end_time by start_time, src_host, src_ip
| eval start_time = mvjoin(strftime(start_time, "%m/%d/%y"), " - ")
| eval end_time = mvjoin(strftime(end_time, "%m/%d/%y"), " - ")
| eval duration = if(start_time == end_time, start_time, start_time . " - " . end_time)
| stats values(duration) as duration by src_host, src_ip
| eval duration = mvjoin(duration, ", ")
| sort limit=0 src_host, src_ip

Output based on your sample above:

src_host   src_ip   duration
--------------------------------------------------------
A   9.9.9.9   06/20/16 - 06/21/16
A   10.10.10.10   06/19/16, 06/22/16, 06/24/16 - 06/26/16, 06/28/16 - 06/29/16 
0 Karma

ctaf
Contributor

Again, thank you for your help. But it is still strange because I have an output like this:

07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......

But in my base search output, I have a successive list of the same IP from 07/15/15 to 12/11/15

If I apply your search only from 07/15/15 to 12/11/15, I get the right output. But with a larger scope, I get

07/15/15 - 07/16/15, 07/20/15 - 07/24/15, 07/27/15 - 07/31/15, 08/03/15 - 08/07/15 ......
0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...