Splunk Search

How to join two searches by closest time fields in my two indexes, not using the _time field?

vbarna
Engager

Hi all,

I am going to simplify my problem. I have two indexes with the following variables:

index 1: time_in user_id

10:55 user1
11:20 user1
12:03 user2

index 2: time_reg user_id colour

10:50 user1 green
10:56 user1 yellow
10:57 user1 red
10:58 user2 green

Neither time_in nor time_reg are the same as _time, and their values are never equal. I want to join the event of index1 with events of index2. Each event must just have a match with the criteria of being the closest in time, but time_reg from index2 can never be after time_in of index1. The result of the join in the example would be:

JOIN: time_in time_reg user_id colour

10:55 10:50 user1 green
11:20 10:57 user1 red
12:03 10:58 user2 green

I have seen similar questions like this that they use 'transaction' or 'localize'. However, I have not found the way to use these commands without relying on the variable _time. Instead I want to not use _time at all and use time_in and time_reg.

Thank you very much in advance!

Tags (2)
0 Karma
1 Solution

somesoni2
Revered Legend

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour

View solution in original post

0 Karma

vbarna
Engager

Thanks for both answers.

At the end I decided to implement a really inefficient solution (I execute this process really seldom so it doesn't matter) that would return more or less what I wanted.
What I do is a join between the two tables on user_id. Then, after the join I do:

eval diff_times=time_in-time_reg | search diff_times>=0 AND diff_times<600000

So at the end I filter the results where the two times are within a range of 10 minutes. I know that this is a really poor solution, but I find joins and time related operations quite difficult in splunk.

0 Karma

woodcock
Esteemed Legend

You should "Accept" your answer and also any other answers that you think are correct to close out the question.

0 Karma

somesoni2
Revered Legend

I don't think there is any efficient way to do this, but give this approach a try.

index=IndexA OR index=indexB  | table time_in time_reg user_id colour index| eval commonfield=coalesce(time_in,time_reg) | sort 0 user_id -commonfield | streamstats current=f window=1 values(user_id) as prevUser values(index) as prevIndex prev(commonfield) as prevTime | where user_id=prevUser AND prevIndex=index AND index="indexB" | rename commonfield as time_reg prevTime as time_in   | table time_in time_reg user_id colour
0 Karma

woodcock
Esteemed Legend

You should be able to override the original _time variable and then use the solutions that you have already seen like this:

... | eval _time = coalesce(time_in, time_reg, _time) | ...
0 Karma
Get Updates on the Splunk Community!

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...