Monitoring Splunk

Faster way to find first occurrence of "duplicate" events

YisroelB
Explorer

I am trying to chart initial logins over time as follows:

index="abc" sourcetype="*apache_access" NOT remote_ident="-"
| table _time remote_ident
| stats earliest(_time) as _time BY remote_ident
| timechart count

but the search is excruciatingly slow.

Any performance tips would be appreciated.

Thanks,

-Yisroel

Tags (1)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

In other words, you want a list of when each user has logged in for the first time?
There are many ways to improve performance. Here's two:

You could build a summary index that contains every user's first login, say, over the past day (depending on the time ranges you're looking at). Then use that condensed data to compute the overall first login whenever you need it, and keep adding new data to the summary every day (or so).

Alternatively, you could build a growing lookup that stores each user's first login, regularly looks for (say) yesterday's logins and adds any users that didn't occur in the past - here's a lovely example: http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/

Both ways will work. Technically I prefer the lookup, because it does one job and one job only - compute first logins - and is extremely easy and fast to use once set up properly.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

In other words, you want a list of when each user has logged in for the first time?
There are many ways to improve performance. Here's two:

You could build a summary index that contains every user's first login, say, over the past day (depending on the time ranges you're looking at). Then use that condensed data to compute the overall first login whenever you need it, and keep adding new data to the summary every day (or so).

Alternatively, you could build a growing lookup that stores each user's first login, regularly looks for (say) yesterday's logins and adds any users that didn't occur in the past - here's a lovely example: http://blogs.splunk.com/2011/01/11/maintaining-state-of-the-union/

Both ways will work. Technically I prefer the lookup, because it does one job and one job only - compute first logins - and is extremely easy and fast to use once set up properly.

YisroelB
Explorer

Thanks again Martin

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The search itself, without using any form of caching, probably scans through loads of events? You can confirm this by looking at the times spent for each task in the search inspector, I expect a huge fraction spent fetching.

If that is the case you can speed things up by providing better filters for the initial search. For example, is there a login page that you can specify that always is visited first? If that is the case then you only need to look at that and ignore everything else.

0 Karma

YisroelB
Explorer

Thank you Martin.

In this case I am looking for recent/real-enough-time data (Today) so option two seems workable. After emptying my lookup at midnight daily, I could search fairly frequently (maybe every 10-30min) for users not already in my lookup, add them and the time of their first appearance to the lookup and return the new contents of the lookup

It's a good solution. Thanks.

The heart of my question, however, was from a pure syntax standpoint could the search response time of a direct search against the existing datasources be improved by using a different approach?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...