Splunk Search

searching multiple sources and concatenating results to one row per event

anderswesterber
New Member

Hi, first time trying to join several logsources in Splunk and it's been a nightmare ;)!

Use-case: I got one logsource(auth) that have timestamp, src_ip, user, login|logout and another logsource(browse) that have timestamp, src_ip, url

I want to get a result that shows one row for each event in browse-logsource with timestamp, ipaddress, url, user fields and I need to be able to search for any time intervall.

One issue is that an event in browse can happen as long as 8 hours (max loginsession is 8h) after a login event matches it, another one is that several users might have logged in/out on the same client several times during the last 8 hours so they have the same src_ip.

I'v read a lot about how to join logs and have tried transaction, stats, join and lookuptables but I never end up with a result that I can trust. I'm beginning to suspect that this was not the best type of logs to begin my Splunk learning curve with ;)! The closest I have been so far is this:

sourcetype="browse" starttime="03/05/2012:20:00:00" endtime="03/05/2012:20:10:00"| join src_ip [search starttime="03/05/2012:12:00:00" endtime="03/05/2012:20:10:00" sourcetype="auth"]

Startime/endtime does work in subsearches even tho I'v seen several posts claiming it doesn't, at least in 4.3 it seems to work. I need the starttime/endtimes to make the subsearch cover the relevant timerange, I'v tried using eval to declare variables but cant figure out how to declare variables before a search. Why doesn't this work (I take $start$ and $end$ times from input fields in a view)?

sourcetype="browse" starttime="$start$" endtime="$end$"|join src_ip [*|eval startepoch = strptime(start, "%m/%d/%Y:%H:%M:%S")|eval startrelative=relative_time(startepoch,"-8h@s")|convert timeformat="%m/%d/%Y:%H:%M:%S" ctime(startrelative)|search starttime="$startrelative$" endtime="$end$" sourcetype="auth"]

And even if I get it to work I cant trust the results since I'm not using the logouts at all..? I tried lookup-tables but unless I make the lookup-table cover all time I dont understand how to use it for historic queries and I'm still not sure how to cover multiple logins from the same src_ip.

Sorry for the very long post, I hope someone can point me in the right direction.

0 Karma

anderswesterber
New Member

Oh, thank you, looks interesting. I'll try this out as soon as possible and get back with the results.

0 Karma

lguinn2
Legend

Ideas for searches that might run more quickly

(sourcetype=auth user=TheUserID) OR sourcetype=browse
| fields src_ip user url
| transaction src_ip startswith="login" endswith="logout" maxspan=8h 
| table _time, user, src_ip, url

This search limits the events from the auth sourcetype to a particular user. This will (hopefully) make the number of transactions smaller. The time range of this search could be set by using the time range drop-down or by setting earliest/latest in the form.

Another way to do it

sourcetype=browse [search sourcetype=auth user=TheUser | dedup src_ip | fields src_ip] 
| append [search sourcetype=auth user=TheUserId ]
| transaction src_ip startswith="login" endswith="logout" 
| table _time, src_ip, url

This search first selects all events of sourcetype "browse" that might have been used by TheUser. This will hopefully eliminate many events from consideration up front. Then it adds in the "auth" events for TheUser, and creates a transaction. This will eliminate any browse events that were not associated with TheUser. This may run faster because the transaction command will have fewer events to process - possibly thousands fewer.

Adding in your time constraints for each of the two searches

(sourcetype=auth user=TheUserID) OR sourcetype=browse
| eval startTime=strptime("$start$","%m/%d/%Y:%H:%M:%S")
| eval endTime=strptime("$end$","%m/%d/%Y:%H:%M:%S")
| where _time>=startTime and _time<=endtime
| fields src_ip user url
| transaction src_ip startswith="login" endswith="logout" maxspan=8h 
| table _time, user, src_ip, url

Second search with time

sourcetype=browse [search sourcetype=auth user=TheUser 
                                | eval startTime=strptime("$start$","%m/%d/%Y:%H:%M:%S")
                                | eval endTime=strptime("$end$","%m/%d/%Y:%H:%M:%S")
                                | where _time>=startTime and _time<=endtime
                                | dedup src_ip | fields src_ip] 
| eval startTime=strptime("$start$","%m/%d/%Y:%H:%M:%S")
| eval endTime=strptime("$end$","%m/%d/%Y:%H:%M:%S")
| where _time>=startTime and _time<=endtime
| append [search sourcetype=auth user=TheUserId 
    | eval startTime=strptime("$start$","%m/%d/%Y:%H:%M:%S")
    | eval endTime=strptime("$end$","%m/%d/%Y:%H:%M:%S")
    | where _time>=startTime and _time<=endtime ]
| transaction src_ip startswith="login" endswith="logout" 
| table _time, src_ip, url

This is more complicated, of course.

All of these searches will be improved, if you are running them in the Splunk GUI, if you pick a time range from the drop-down.

Why do you have to perform the search (using the drop-down time range) and then select the results in your time range? It's a little complicated, but

First, you can put an eval command at the beginning of the "search". This works:

| eval myVar="xyz" | search sourcetype=auth user="user name"

This also works, if $username$ and $newValue$ were defined in your form.

| eval myVar="$newValue$" | search sourcetype=auth user="$username$"

But this will not work:

| eval myVar="$newValue$" | search sourcetype=auth user=myVar

Search does not compare two variables. The right side of the equals sign must be a literal.

So setting some earliest time that cannot be exceeded (like last 30 days) will constrain the initial search and make it run faster. You should also set some time constraint on the subsearch as well; the subsearch does not automatically use the outer search's time constraints.

Hope this helps you take it to the next level...

0 Karma

lguinn2
Legend

"You answered your own question" just means that you posted an "answer" to a question that you had written.

The transaction command is limited to the timeframe you set in the main search, absolutely. And no matter what timeframe you choose, there may be some folks who logged-in before that and who therefore will not have a "login" event. You are also right that the timeframe limits both sourcetypes.

maxspan=8h just means that a single transaction cannot be longer than 8 hours. It's a way of capturing the rule that a login session cannot exceed 8 hours. This information also helps Splunk create transactions more efficiently.

Try the transaction command, although you may well have a "huge" number of events. It may not work, as you found out before. But now that I see that you are reporting on a particular user, I'll try to think of other ways to search for this...

0 Karma

anderswesterber
New Member

First of all, I dont get how these posts are supposed to work, "you answered your own question"?

Thanks for the response but I'm still a bit confused. My main issue with the transaction command was that the results showed a lot of events that had no matching login event so I figured the transaction command was limited to the timeframe you set in the main search, is that wrong?

Maxspan=8h just seems to limit maxspan from the default unlimited which i assume just means the whole specified timeframe?

Wouldn't the timeframe limit both sourcetypes and if I for example use a timeframe like 02:00-04:00 would I even see a login event from the auth-source that was before that timeframe?

What is a "huge number of events"? A high production hour generates over 300K events and a 24h period generates overall 1M to 1.2M events.. quite often I'm asked to deliver reports on a user over several days/weeks.

0 Karma

lguinn2
Legend

Try this

sourcetype=auth OR sourcetype=browse | transaction src_ip startswith="login" endswith="logout" maxspan=8h 

Take a look at your results. Read about the transaction command in the manual, too. There are some limitations; if you have a huge number of events, Splunk may not have sufficient memory to figure it out.
Next, consider how you really want to see the results. It is possible that the following may work for you

sourcetype=auth OR sourcetype=browse | 
transaction src_ip startswith="login" endswith="logout" maxspan=8h |
table _time, user, src_ip, url

If not, then ask again...

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...