Splunk Search

Performance question -- stats vs. lookup

responsys_cm
Builder

We're trying to build some searches that will enable us to do fraud detection for our customers. One thing we're looking at doing is building a profile of what browser agents are used by individual users. That profile data would be stored in a lookup table and periodic searches would run to check browser agents against the lookup table.

So, there are a couple ways we could do that. We could have a scheduled search that will take each event and do a lookup on the browser agent and alert when it doesn't match. Or we could pipe those search results through stats and get a count of browser agents by user and then do the lookup.

What's more efficient? The stats command or doing one lookup per event?

Thx.

Craig

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

old question but still valid.

The different options you mentioned yourself have different meanings!

First you need a base line for the lookup. Which user agents are "allowed".

If you search for e.g. all events of the last 15 minutes and then do a lookup for every event this could be 50000 lookups if you have 50000 events in 15 minutes (but it's quite fast because lookups are stored in memory if possible).
The advantage is that you are able to tell for every single event whether it has the right user agent or not.

If you do a "stats count by useragent,clientip, ...|lookup valid_agents useragent" you would usually reduce the amount of lookups by an order of magnitude but you might not know "at what time" the invalid request was issued (unless you add _time to the by clause).

So the big question is: What is your goal?

Both ways are an option, you have to make clear what you want to get 🙂

HTH,

Holger

0 Karma

lguinn2
Legend

This isn't a real answer, so I'll just put it out as a comment: you could try to run it both ways - and use the Search Job Inspector (the "i" button in 4.3) to look at the performance stats of each search. (BTW, the search performance stats are also saved in a Splunk log and "splunked" into the _internal index, so you could look there as well.)

I also think that the performance may be different depending on the number of customers within your search time range: is it hundreds or thousands etc.? And the size of the lookup table would be a few thousand rows, tens of thousands, etc.?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...