Splunk Search

How to join two searches that have two common fields and put a condition on one of the common fields?

GauriSplunk
Path Finder

I want to do a join of two searches that have a common field ID and time, but I want to have a condition on time when IDs match.
Consider two tables user-info and some-hits

user-info

name ipaddress time
user1 20.20.20.20 t0
user2 20.20.20.20 t1
user1 30.30.30.30 t2

some-hits
ipaddress hits time
20.20.20.20 10 t0.03
20.20.20.20 40 t0.03
20.20.20.20 46 t0.9
30.30.30.30 60 t1
30.30.30.30 78 t1.5

Here I want to match all records from some-hits with user-info whose ipaddress match, but some-hits.time < user-info.time
So t0.03 should match with t0 . t0.9 should match with t1. t1 and t1.5 should match with t2.

I wanted to know how I can use join to achieve this?
Thanks

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi GauriSplunk,

after all your feedback here is a new answer, which is tested and working all based on your provided information.

First create a csv file called user-history containing:

username, ipaddress, logintime, logouttime
user1, 20.20.20.20, 12.00, 12.05
user2, 20.20.20.20, 12:10, 12:50
user1, 30.30.30.30, 11:40, 11:55

next create a csv file called hits-table containing:

Clientid, hits, time, domain
20.20.20.20, 2, 12:02, fb.com
20.20.20.20, 3, 12:02, fb.com
30.30.30.30, 5, 11:45, boo.com

Load them into Splunk using the methods from the docs http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Getstartedwithgettingdatain or us the URI http://YourSplunkServer/en-US/manager/search/adddata. Once the data is indexed use this search to get the result based on your last comment:

source="user-history.csv" OR source="hits-table.csv" host="indexer" sourcetype="csv" 
| eval ClientId=coalesce(Clientid,ipaddress) 
| fields ClientId, username, domain, time, hits 
| streamstats values(username) AS user by ClientId 
| stats sum(hits) AS total by user, ClientId, domain, time

Let me break down this and explain what is happening here:
This is the base search which will return all needed fields to get to the result.

source="user-history.csv" OR source="hits-table.csv" host="indexer" sourcetype="csv" 

Next step is to use either Clientid or ipaddress as field ClientId

| eval ClientId=coalesce(Clientid,ipaddress) 

Now we limit the used fields for the next steps

| fields ClientId, username, domain, time, hits 

This is the most important step for you, because it will map the users to ClientId

| streamstats values(username) AS user by ClientId 

so it can be used in the final stats to get the total hits on a domain by user and time

| stats sum(hits) AS total by user, ClientId, domain, time

If this still does not match your requirement, modify it until your done 😉

Hope this helps and it will be much more efficient then join and you will not hit any sub-search limits.

cheers, MuS

0 Karma

GauriSplunk
Path Finder

Thanks MuS for your feedback. I will work on this.
I need to access _time value from the event generated. Since _time is present in all events. If I need to access _time from hits-table, how do i do it.
I want to compare _time from hits_table such that it should be between logout-time and login_time of user-history to generate a match in addition to ip_address.
-Gauri

0 Karma

MuS
SplunkTrust
SplunkTrust

Start working on the provided example, the last step will be a minor one - hint :

| where time > logintime AND time < logouttime

But you have to use a convert dur2sec() on all time values to be able to compare them; Splunk does not like for example the : in logintime or time

You will get there 😉

0 Karma

GauriSplunk
Path Finder

I need to access the event generated time which splunk stores in _time field.
Since this field is same for hits_table and user_history, how cna i specify that i want to read the _time from hits_table and not user_history.
sorry , I am doing this for the first time hence so many questions.

0 Karma

MuS
SplunkTrust
SplunkTrust

hint:

 ...| eval myTime=if(like(source, "%hits_table.csv"), _time, null())

see docs http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/CommonEvalFunctions

0 Karma

GauriSplunk
Path Finder

I created csv files as u suggested :
username, ipaddress, logintime, logouttime
user1, 20.20.20.20, 11/3/2015 12:00:00, 11/3/2015 12:05:00
user2, 20.20.20.20, 11/3/2015 12:10:00, 11/3/2015 12:50:00
user1, 30.30.30.30, 11/3/2015 11:40:00, 11/3/2015 11:55:00

Clientid, hits, time, domain
20.20.20.20, 2, 11/3/2015 12:02:00, fb.com
20.20.20.20, 3, 11/3/2015 12:02:00, fb.com
30.30.30.30, 5, 11/3/2015 11:45:00, boo.com

i uploaded them into splunk under index=ib_test_sample
This is the query
index="ib_test_sample" sourcetype="csv"|eval ClientId=coalesce(Clientid,ipaddress)| fields ClientId, username, domain, time, hits,logintime,logouttime| eval start_time=strptime(logintime, "%m/%d/%Y %H:%M:%S")| eval end_time=strptime(logouttime, "%m/%d/%Y %H:%M:%S") |eval dns_time=strptime(time, "%m/%d/%Y %H:%M:%S")|streamstats values(username) AS user by ClientId|stats sum(hits) AS total by user, ClientId, domain| table user, ClientId, total,dns_time,start_time,end_time

i havent added time where time is between logintime and logout time yet sinc e this is not working.
The result I get is

user ClientId total dns_time start_time end_time
user2 20.20.20.20 5

The time fields are not displayed and the result just has 1 entry which is incorrect.

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi GauriSplunk,

join is the last resort to solve search problems, not the first choice - see docs http://docs.splunk.com/Documentation/Splunk/6.3.0/SearchReference/Join or this https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

Your problem here is the value of time is not a number, its a string and therefore Splunk will not do what you expect because it will compare it differently. First you need to remove the t from the time values and convert it to a numeric value:

 eval time=tonumber(trim(time, "t"))

Next you can get the two time values into new field depending on the source:

eval user_time=tonumber(trim(like(source, "%user-info%"), "t")) | eval some_time=tonumber(trim(like(source, "%some-info%"), "t")) 

And finally use the new time fields to compare them:

your base search here 
| eval user_time=tonumber(trim(like(source, "%user-info%"), "t")) 
| eval some_time=tonumber(trim(like(source, "%some-info%"), "t")) 
| stats count by ipaddress, name, hits, user_time, some_time
| where some_time < user_time

This is un-tested so you probably need to tweak it, but it should give you some hints how it can be done.

cheers, MuS

0 Karma

GauriSplunk
Path Finder

Thanks for your reply MuS. In my case , both searches have diff index. one of the search is a summary search and the other is a detailed search. The detailed search is my main search (user-info) . The summary search (some-hits) is the second search.

In the link you sent, https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...
the index is the same but different sourcetypes

Can I write a query w/o join in this use case?
Thanks
-Gauri

0 Karma

MuS
SplunkTrust
SplunkTrust

Hi Gauri,

it does not matter if your events are in different index, source or sourcetypes you can most likely use stats instead of join 😉
Just use as base search something like this:

index=a Or index=b sourcetype=c OR sourcetype=d | more Splunk Fu ...

cheers, MuS

0 Karma

GauriSplunk
Path Finder

ok . I will try it.
I also need to find the total hits for all the matched ipaddress and time event.
in the example above, I am expecting an output like:

name time ipaddress #hits

user1 t0 20.20.20.20 50 (10 + 40)
user2 t1 20.20.20.20 46
user1 t2 30.30.30.30 138 (60 + 78)

Can i calculate sum for every matched combination from two search result sets?

0 Karma

MuS
SplunkTrust
SplunkTrust

For my above example it would be like this:

your base search here 
 | eval user_time=tonumber(trim(like(source, "%user-info%"), "t")) 
 | eval some_time=tonumber(trim(like(source, "%some-info%"), "t")) 
 | stats sum(hits) AS total_hits by ipaddress, name, hits, user_time, some_time
 | where some_time < user_time
0 Karma

GauriSplunk
Path Finder

hi MuS,
Thanks for your help.
I tried to use OR for my base search like u said:
(sourcetype=ib:reserved source=ib:user index=ib_security) OR index=ib_summary report=si_hits

I first just wanted ot try this out to see what events I get.
I just get results for the summary search ( index=ib_summary report=si_hits).
I do not get any events from first search (ib:user)

What i am missing here?
Thanks

0 Karma

MuS
SplunkTrust
SplunkTrust

Well run each search seperate and see if you get results back:

 (sourcetype=ib:reserved source=ib:user index=ib_security)

and then

 index=ib_summary report=si_hits 

BTW, as mentioned in your earliest=-1w question, you should NOT set source in inputs.conf see the docs why : http://docs.splunk.com/Documentation/Splunk/6.3.1/admin/Inputsconf

0 Karma

GauriSplunk
Path Finder

Hi MuS,
The solution given above does not work for me. I think it was interpreted wrongly.
Here is my use case:

I have search user-history

username ipaddress logintime logout time
user1 20.20.20.20 12.00 12.05
user2 20.20.20.20 12:10 12:50
user1 30.30.30.30 11:40 11:55

I have another search hits-table
Clientid hits time domain
20.20.20.20 2 12:02 fb.com
20.20.20.20 3 12:02 fb.com
30.30.30.30 5 11:45 boo.com

In second search I want to first find total hits by time and domain which gives me

clientid total-count time domain
20.20.20.20 5 12:02 fb.com
30.30.30.30 5 11:45 boo.com

Now i want to match the user-history results with this such that my result table should look like
Here user2 did not have any hits in the time it was logged in. So its record will not be in result table.

The match should be on ipaddress==clientid and time is between logintime and logout time
username ipaddress #hits time

user1 20.,20.20.20 5 _time value from hits-table
user1 30.30.30.30 5 _time value from hits-table

Can this be done without join?
What would be the best approach considering the user-history table would be large

Thanks
-Gauri

0 Karma

sundareshr
Legend

Just to add clarity to your search, what you have is

(sourcetype=ib:reserved AND source=ib:user AND index=ib_security) OR index=ib_summary AND report=si_hits. In other words, this will match

(sourcetype=ib:reserved AND source=ib:user AND index=ib_security) AND report=si_hits

OR

(index=ib_summary AND report=si_hits)

What I think you are looking for is

(sourcetype=ib:reserved source=ib:user index=ib_security) OR (index=ib_summary report=si_hits)

where report is a key field only found in the summary index. Give that a shot.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...