Splunk Search

How can I catch the both the first and last occurance of an event?

PHRaymond
Explorer

Scenario: figure out what user is using a given IP at a given point in time by using proxy logs, which captures the user's ID each time they visit a web page.

Right now I do two separate searches. This one catches the last time the user's ID shows in the log for the IP in the time period being searched:

192.168.1.1 sourcetype=proxy | dedup user | sort-_time | table _time, user,
IP

This gives me a result of:

_time user IP

1 18/03/2012 18:57:27.000 Alice 192.168.1.1

2 18/03/2012 12:47:41.000 Bob 192.168.1.1

So there's the last time they used the IP for the range searched. I then run a similar search that gives me the first time the user's ID shows:

192.168.1.1 sourcetype=proxy | dedup user sortby +_time | sort-_time |
table _time, user, IP

This gives me a result of:

_time user IP

1 18/03/2012 13:44:42.000 Alice 192.168.1.1

2 18/03/2012 07:44:40.000 Bob 192.168.1.1

So there's the first time that they used the IP. Looking at the two, I know that Alice first used the IP at 7:44 AM and last used it at 18:57 PM.

However, I have to run both searches and notate the results elsewhere in order to calculate that. Easy enough in this example, but when I have a few dozen people sharing dynamic IPs (think a VPN type of situation), it gets pretty hairy pretty fast.

So, what I want is a search that combines both of of the above and gives me an output like this:

_time user IP Start/Stop

1 18/03/2012 18:57:27.000 Alice 192.168.1.1 Stop

2 18/03/2012 13:44:42.000 Alice 192.168.1.1 Start

3 18/03/2012 12:47:41.000 Bob 192.168.1.1 Stop

4 18/03/2012 07:44:40.000 Bob 192.168.1.1 Start

Thoughts?

Thank you.

Peter

Tags (2)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Better would simply be:

sourcetype=proxy ip=192.168.1.1 
| stats earliest(_time) as start, latest(_time) as stop
  by user,ip

This will also run faster (basically twice as fast) as a solution that has to get earliest and latest, since it does a single pass through the data. If you really want to split out separate rows for start and stop (instead of just having them be separate columns), you could do:

sourcetype=proxy ip=192.168.1.1 
| stats earliest(_time) as start, latest(_time) as stop
  by user,ip
| eval uip=user+";"+ip
| fields uip start stop
| untable uip action _time
| eval uip=split(uip,";") | eval user=mvindex(uip,0) | eval ip=mvindex(uip,1)

But you might be better off with it as above.

gkanapathy
Splunk Employee
Splunk Employee

sort before you convert: 192.168.1.1 sourcetype=proxy | stats earliest(_time) as start, latest(_time) as stop by user,ip | sort - stop | fieldformat start=(start, "%d/%m/%y %H:%M:%S") | fieldformat stop=(stop, "%d/%m/%y %H:%M:%S")

0 Karma

PHRaymond
Explorer

I'm running some test scenarios and the first question that pops up is: how can I get the results to sort so that the "stop" column is ordered latest to earliest? (This is coming up as I experiment with using multiple sourcetypes to build a bigger picture; some proxy logs don't have a user but, say, a VPN log for the same IP might.)

0 Karma

PHRaymond
Explorer

(As for the why on that last, I use a macro that fills in everything after the IP for me automatically, so my methodology is to type/paste in the IP I'm looking for and then kick off the macro.)

0 Karma

PHRaymond
Explorer

Ok, now we're cooking with gas! I like "twice as fast" and "single pass" quite a bit, and I'm not wedded to the separate line idea; in fact, the columns makes even more sense.

So, combining this with Damien's time conversion, I've got:

sourcetype=proxy ip=192.168.1.1 | eval time=strftime(_time, "%d/%m/%y %H:%M:%S") | stats earliest(time) as start, latest(time) as stop by user,ip

I like it. Now, question, would this work as efficiently:

192.168.1.1 sourcetype=proxy | eval time=strftime(_time, "%d/%m/%y %H:%M:%S") | stats earliest(time) as start, latest(time) as stop by user,ip

0 Karma

Damien_Dallimor
Ultra Champion

As GKC suggested , perhaps a transaction might be better.

But anyhow to try and do what you asked for , try something like :

sourcetype=proxy| stats first(_time) as _time by user,IP | eval action="Stop" | append [search sourcetype=proxy | stats last(_time) as _time by user,IP |eval action="Start"] | eval time=strftime(_time, "%d/%m/%y %H:%M:%S") | table time user IP action

Should produce the results in the format you specified for any IP used by the users within your chosen search time range.

0 Karma

Damien_Dallimor
Ultra Champion

from the Convert docs:

Note: This command is mostly deprecated, and its functionality has been re-worked as functions of the eval command such as strftime(), strptime(), or tostring().

Damien_Dallimor
Ultra Champion

edited , had a missing "search" function.

0 Karma

PHRaymond
Explorer

I'm poking at GKC's answer now. I'm certainly open to better searches. That's why I'm here!

On yours, I try to run it and I get:

"Search operation 'sourcetype' is unknown. You might not have permission to run this operation."

That seems odd to me...

Unrelated, your " eval time=strftime(_time, "%d/%m/%y %H:%M:%S") " intrigues me. I've been using " convert ctime(_time) as timestamp " for what I think is the same purpose. Why might one use one over the other?

0 Karma

GKC_DavidAnso
Path Finder

I don't know quite what you're trying to do with the end search, but I would be inclined to do something like"

192.168.1.1 sourcetype=proxy |transaction user IP | table _time, user, IP

It is quite possible that I will end up on the same IP in different sesisons, so your first and last search might not be completely accurate.

0 Karma

GKC_DavidAnso
Path Finder

You can find out more about transactions here: http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/transaction

Try this: 192.168.1.1 sourcetype=proxy |transaction user IP startswith=Start endswith=Stop | table _time, user, IP, duration

The transaction joins like events together. user and IP are the fields that we are using to join. The transaction calculates the duration between the first and last events for you and saves it into a value called duration.

I can see you've already got some other quite nice solutions too.

0 Karma

PHRaymond
Explorer

For example, on my prod box, running a 30 day query against an IP gives me 8 results using the transaction string you list, and 3 users. Removing it gives me 5,962 results, still 3 users. Why those 8 in the first and so many in the second? I need a confidence level that I'm not missing something.

I must go poke at this some more....

0 Karma

PHRaymond
Explorer

Hmmm. You got me to thinking with your comment "don't know quite what you're trying to do with the end search." I was so focused on the first/last time aspect that perhaps I was missing what I'm really after. If I just do my search against the first occurance in a time frame, the simple existence of a new user showing up means the prior user had to have logged off.

And I hadn't thought about the possibility of one user pulling the same IP in two sessions across a period of time!

So, transaction. Interesting. What precisely does that get me?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...