Splunk Search

Join in Splunk

cx233alvin
Explorer

Hi there!

Just want to ask if possible to execute a non-equijoin in Splunk? A non-equijoin (in SQL) joins two table using non-equal operator such as <, >, BETWEEN-AND etc.

I have two tables:

Employees
Staff ID | Log-in | Log-out | Location
1001 | 12-30-2017 11:30:00 | 12-30-2017 23:30:00 | LDP1
1002 | 12-30-2017 12:30:00 | 12-30-2017 20:30:00 | LDP3

Sales
Txn Date Time | Location | Txn Type
12-30-2017 11:48:17 | LDP1 | Sale

I want to count the number of transactions done by each staff based on their location, log-in and log-out time

Example:
Staff ID | Date |Location | Count of Txns
1 | 12-30-2017 | LDP 1 | 345
2 | 12-30-2017 | LDP 2 | 416

Thanks in advance.

Tags (1)
0 Karma
1 Solution

lguinn2
Legend

Try this

(index="sales" AND Txn-Type=Sale) OR (index="employee")
| eval timestamp =strptime( if(index="employee",'Log-in','Txn-Date-Time'),"%d-%m-%Y %H:%M:%S")
| sort Location "Staff-ID" timestamp index
| streamstats current=f reset_on_change=t 
        last("Staff-ID") as employee last("Log-in") as login last("Log-out") as logout by location "Staff-ID"
| eval login=strptime(login,"%d-%m-%Y %H:%M:%S")
| eval logout=strptime(logout,"%d-%m-%Y %H:%M:%S")
| where index="sales" and timestamp >= login and timestamp <= logout
| eval Date=strftime(timestamp,"%x"
| stats count as "Count of Txns" by employee date Location

You may need to play around with streamstats command to get this to work as you want. I didn't have any test data to verify that this will work.

View solution in original post

cx233alvin
Explorer

Thanks @niketnilay and @lguinn for your replies. I was able to do it based on your suggestions.

Similar to @lguinn, I just sorted the rows by location and loginTime/ txnDateTime. Then, I used streamstats to get the last login, logout and employeeID. I used eval to check whether the sale transaction is within the login and logout time (set 1 if yes, otherwise 0). Finally, I used sum to get the total number of transactions per teller.

lguinn2
Legend

Try this

(index="sales" AND Txn-Type=Sale) OR (index="employee")
| eval timestamp =strptime( if(index="employee",'Log-in','Txn-Date-Time'),"%d-%m-%Y %H:%M:%S")
| sort Location "Staff-ID" timestamp index
| streamstats current=f reset_on_change=t 
        last("Staff-ID") as employee last("Log-in") as login last("Log-out") as logout by location "Staff-ID"
| eval login=strptime(login,"%d-%m-%Y %H:%M:%S")
| eval logout=strptime(logout,"%d-%m-%Y %H:%M:%S")
| where index="sales" and timestamp >= login and timestamp <= logout
| eval Date=strftime(timestamp,"%x"
| stats count as "Count of Txns" by employee date Location

You may need to play around with streamstats command to get this to work as you want. I didn't have any test data to verify that this will work.

niketn
Legend

@lguinn [Splunk], just FYI, following is the query I used to mock up data for my SPL above in comment. I have not posted mine as answer because it will work for first 100 match per location.

| makeresults
| eval employee="1001 | 12-30-2017 11:30:00 | 12-30-2017 23:30:00 | LDP1;1002 | 12-30-2017 12:30:00 | 12-30-2017 13:30:00 | LDP1;1003 | 12-30-2017 22:30:00 | 12-30-2017 23:30:00 | LDP2"
| makemv employee delim=";"
| mvexpand employee
| eval employee=replace(employee," ","")
| makemv employee delim="|"
| eval Staff-ID=mvindex(employee,0),Log-in=mvindex(employee,1),Log-out=mvindex(employee,2),Location=mvindex(employee,3)
| fields Staff-ID, Log-in,Log-out,Location
| eval _time=strptime('Log-in',"%m-%d-%Y%H:%M:%S")
| eval index="employee"
| append [| makeresults
| eval sales="12-30-2017 11:48:17 | LDP1 | Sale;12-30-2017 13:48:17 | LDP1 | Sale"
| makemv sales delim=";"
| mvexpand sales
| eval sales=replace(sales," ","")
| makemv sales delim="|"
| eval Txn-Date-Time=mvindex(sales,0),Location=mvindex(sales,1),Txn-Type=mvindex(sales,2)
| fields Txn-Date-Time,Location,Txn-Type
| eval _time=strptime('Txn-Date-Time',"%m-%d-%Y%H:%M:%S")
| eval index="sales"]
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

lguinn2
Legend

How do you know which employee made which sale? Is it possible to have 2 employees at the same location at the same time?

0 Karma

cx233alvin
Explorer

You can determine which employee performed the sale if:
a. the date and timestamp of sale is within the log-in and logout of employee; AND
b. the location of sale is equal to location of employee.

No. there is only one employee assigned at specific time and location. It is not possible to attribute 1 sale to multiple employees.

0 Karma

niketn
Legend

@cx233alvin, since Location is the correlating field between employee and sale (because Staff-ID is missing in sale). If per location you can find up to 100 StaffID who made the sale, you can try the following:

 (index="sales" AND Txn-Type=Sale) OR (index="employee")
| stats list(Staff-ID) as Staff-ID list(Log-in) as Log-in list(Log-out) as Log-out list(Txn-Date-Time) as Txn-Date-Time values(index) as index by Location
| search index="employee" index="sales"
| eval data=mvzip('Staff-ID',mvzip('Log-in',mvzip('Log-out','Txn-Date-Time')))
| fields - "Staff-ID" "Log-in" "Log-out" "Txn-Date-Time"
| mvexpand data
| eval data=split(data,",")
| eval "Staff-ID"=mvindex(data,0), "Log-in"=mvindex(data,1), "Log-out"=mvindex(data,2), "Txn-Date-Time"=mvindex(data,3)
| fields - data
| eval Log-in=strptime('Log-in',"%m-%d-%Y%H:%M:%S"), Log-out=strptime('Log-out',"%m-%d-%Y%H:%M:%S"), Txn-Date-Time=strptime('Txn-Date-Time',"%m-%d-%Y%H:%M:%S")
| where 'Log-in'<='Txn-Date-Time' AND 'Txn-Date-Time'<='Log-out'
| fieldformat "Log-in"=strftime('Log-in',"%m-%d-%Y %H:%M:%S")
| fieldformat "Log-out"=strftime('Log-out',"%m-%d-%Y %H:%M:%S")
| fieldformat "Txn-Date-Time"=strftime('Txn-Date-Time',"%m-%d-%Y %H:%M:%S")
____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

bangalorep
Communicator

Hello! Is all the data in one index? Or are they spread across multiple?

0 Karma

cx233alvin
Explorer

They are from 2 different indices.

0 Karma

bangalorep
Communicator
0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...