I'm trying to get a table with an row for each requested URL (the latest request found in the logfile) and the preceding and following URL that was requested from the same IP that made this last request.
Example logfile entries:
1.2.3.4 - - [15/Jul/2015:23:59:58 +0200] 0 "GET /index.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:03:15 +0200] 0 "GET /somepage.html HTTP/1.1" 200 251
4.3.2.1 - - [15/Jul/2015:24:03:15 +0200] 0 "GET /differentpage.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:29:01 +0200] 0 "GET /otherpage.html HTTP/1.1" 200 251
1.2.3.4 - - [15/Jul/2015:24:33:10 +0200] 0 "GET /yetanotherpage.html HTTP/1.1" 200 251
The result should look something like:
requested URL | preceding URL | following URL
/somepage.html | /index.html | /otherpage.html
/otherpage.html | /somepage.html | /yetanotherpage.html
...
So the values in the column "requested URL" would be unique. It would be great if I also could limit the time frame in which the preceding and following request may occur to for example 5 minutes.
Here is my second answer, which uses a join. Neither of these solutions is particularly fast...
yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url latest(_time) as requested_time by ip_addr
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort URL -_time | dedup URL
|rename URL as requested_URL | rename _time as requested_time
| table requested_time ip_addr requested_URL ]
| table requested_url preceding_URL following_url
Here is my second answer, which uses a join. Neither of these solutions is particularly fast...
yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url latest(_time) as requested_time by ip_addr
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort URL -_time | dedup URL
|rename URL as requested_URL | rename _time as requested_time
| table requested_time ip_addr requested_URL ]
| table requested_url preceding_URL following_url
After slight changes this one worked for me. Here is the search that is working for me:
yourbasesearch
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_url latest(URL) as requested_url latest(_time) as requested_time by ip_addr
| eval following_url=URL
| join type=inner requested_url requested_time ip_addr [ search yourbasesearch | sort -_time | dedup URL
|rename URL as requested_url | rename _time as requested_time
| table requested_time ip_addr requested_url ]
| table requested_url preceding_url following_url
I had to remove the URL from the sort in the subsearch since I only got one result when I used the URL sorting. Some of the URL variables had different names (sometimes URL written in lower case sometimes in upper case).
Thank you very much!
Initially, I couldn't figure out how to do it all in one search, so here is an answer using lookups.
This search will retrieve the latest URL in the log file for each unique value of URL:
yourbasesearch | sort URL -_time | dedup URL |eval result = "match" | table _time ip_addr URL result| outputlookup url_lookup
The following search will provide the table you want, based on the lookup file (and assuming that you defined the lookup url_lookup
, with a default value of "no match")
yourbasesearch [ inputlookup url_lookup | fields ip_addr ]
| sort ip_addr _time
| streamstats current=false window=2 earliest(URL) as preceding_URL latest(URL) as requested_url latest(_time) as requested_time by ip_addr
| eval following_url=URL
| lookup url_lookup requested_url AS URL ip_addr AS ip_addr _time as requested_time
| where result=="match"
| table requested_url preceding_URL following_url
I think this will work, let me know if you test it and it doesn't!
I tried to set up the lookup, but it seems like I can't do it on our 32bit installation. I haven't used lookups yet, so I'm not sure if I searched for the right way to define the lookup, but I can't use KV Store lookups in the 32bit version.
There will be no "following URL" if the requested URL is the latest...
Oh, never mind, I think I get it now...
I don't have the full answer, but I'm thinking something with autoregress.
[your search data] | sort 0 ip _time | autoregress p=1 url AS next_url | sort 0 ip - _time | autoregress p=1 url AS prev_url | table ip prev_url url next_url
And I think you can use the same approach to gather the previous IP. If the IP's don't match, then set the next/prev url as "no following" or "no previous" as needed.