Solved: How to join two searches by closest time fields in...

vbarna · ‎08-10-2015

Hi all,

I am going to simplify my problem. I have two indexes with the following variables:

index 1: time_in user_id

10:55 user1
11:20 user1
12:03 user2

index 2: time_reg user_id colour

10:50 user1 green
10:56 user1 yellow
10:57 user1 red
10:58 user2 green

Neither time_in nor time_reg are the same as _time, and their values are never equal. I want to join the event of index1 with events of index2. Each event must just have a match with the criteria of being the closest in time, but time_reg from index2 can never be after time_in of index1. The result of the join in the example would be:

JOIN: time_in time_reg user_id colour

10:55 10:50 user1 green
11:20 10:57 user1 red
12:03 10:58 user2 green

I have seen similar questions like this that they use 'transaction' or 'localize'. However, I have not found the way to use these commands without relying on the variable _time. Instead I want to not use _time at all and use time_in and time_reg.

Thank you very much in advance!

somesoni2 · ‎08-10-2015

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

View solution in original post

vbarna · ‎08-11-2015

Thanks for both answers.

At the end I decided to implement a really inefficient solution (I execute this process really seldom so it doesn't matter) that would return more or less what I wanted.
What I do is a join between the two tables on user_id. Then, after the join I do:

eval diff_times=time_in-time_reg | search diff_times>=0 AND diff_times<600000

So at the end I filter the results where the two times are within a range of 10 minutes. I know that this is a really poor solution, but I find joins and time related operations quite difficult in splunk.

woodcock · ‎08-11-2015

You should "Accept" your answer and also any other answers that you think are correct to close out the question.

somesoni2 · ‎08-10-2015

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

woodcock · ‎08-10-2015

You should be able to override the original _time variable and then use the solutions that you have already seen like this:

... | eval _time = coalesce(time_in, time_reg, _time) | ...

How to join two searches by closest time fields in my two indexes, not using the _time field?

index 1: time_in user_id

index 2: time_reg user_id colour

JOIN: time_in time_reg user_id colour

Splunk Custom Visualizations App End of Life

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk