Splunk Search

Get preceding and following requests with common field value for each requested URL

cschardt
Engager

I'm trying to get a table with an row for each requested URL (the latest request found in the logfile) and the preceding and following URL that was requested from the same IP that made this last request.

Example logfile entries:
1.2.3.4 - - [15/Jul/2015:23:59:58 +0200] 0 "GET /index.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:03:15 +0200] 0 "GET /somepage.html HTTP/1.1" 200 251
4.3.2.1 - - [15/Jul/2015:24:03:15 +0200] 0 "GET /differentpage.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:29:01 +0200] 0 "GET /otherpage.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:33:10 +0200] 0 "GET /yetanotherpage.html HTTP/1.1" 200 251

The result should look something like:

requested URL | preceding URL | following URL
/somepage.html | /index.html | /otherpage.html
/otherpage.html | /somepage.html | /yetanotherpage.html
...

So the values in the column "requested URL" would be unique. It would be great if I also could limit the time frame in which the preceding and following request may occur to for example 5 minutes.

Tags (2)
0 Karma
1 Solution

lguinn2
Legend

Here is my second answer, which uses a join. Neither of these solutions is particularly fast...

yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url latest(_time) as requested_time by ip_addr 
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort URL -_time | dedup URL 
         |rename URL as requested_URL | rename _time as requested_time
         | table requested_time ip_addr requested_URL ]
| table requested_url preceding_URL following_url

View solution in original post

lguinn2
Legend

Here is my second answer, which uses a join. Neither of these solutions is particularly fast...

yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url latest(_time) as requested_time by ip_addr 
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort URL -_time | dedup URL 
         |rename URL as requested_URL | rename _time as requested_time
         | table requested_time ip_addr requested_URL ]
| table requested_url preceding_URL following_url

cschardt
Engager

After slight changes this one worked for me. Here is the search that is working for me:

yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_url latest(URL) as requested_url latest(_time) as requested_time by ip_addr
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort -_time | dedup URL
|rename URL as requested_url | rename _time as requested_time
| table requested_time ip_addr requested_url ]
| table requested_url preceding_url following_url

I had to remove the URL from the sort in the subsearch since I only got one result when I used the URL sorting. Some of the URL variables had different names (sometimes URL written in lower case sometimes in upper case).

Thank you very much!

0 Karma

lguinn2
Legend

Initially, I couldn't figure out how to do it all in one search, so here is an answer using lookups.
This search will retrieve the latest URL in the log file for each unique value of URL:

yourbasesearch | sort URL -_time | dedup URL |eval result = "match" | table _time ip_addr URL result| outputlookup url_lookup

The following search will provide the table you want, based on the lookup file (and assuming that you defined the lookup url_lookup, with a default value of "no match")

yourbasesearch [ inputlookup url_lookup | fields ip_addr ]
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url  latest(_time) as requested_time by ip_addr 
| eval following_url=URL
| lookup url_lookup  requested_url AS URL ip_addr AS ip_addr _time as requested_time
| where result=="match"
| table requested_url preceding_URL following_url

I think this will work, let me know if you test it and it doesn't!

0 Karma

cschardt
Engager

I tried to set up the lookup, but it seems like I can't do it on our 32bit installation. I haven't used lookups yet, so I'm not sure if I searched for the right way to define the lookup, but I can't use KV Store lookups in the 32bit version.

0 Karma

lguinn2
Legend

There will be no "following URL" if the requested URL is the latest...

Oh, never mind, I think I get it now...

0 Karma

maciep
Champion

I don't have the full answer, but I'm thinking something with autoregress.

[your search data] | sort 0 ip _time | autoregress p=1 url AS next_url | sort 0 ip - _time | autoregress p=1 url AS prev_url | table ip prev_url url next_url

And I think you can use the same approach to gather the previous IP. If the IP's don't match, then set the next/prev url as "no following" or "no previous" as needed.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...