Subsearch limits. Better approach?

gbarwis · ‎03-01-2011

I'm a Splunk rookie, so i apologize if I'm unclear on my question.

I have one index which contains, among other things, both a username and a clientip for each successful login. This index gets approximately 30,000 entries per hour.

I have a second index which contains, among other things, both the same username and the site (AMS or DAL) for each successful login. This index gets approximately 40,000 entries per hour.

What I need to accomplish is a 30-day search that will output a list of all unique IP addresses that connected to one of the sites, so I can GeoIP it and produce a list of countries-of-origin logging in to each site.

Here is the query that I've come up with. This works like a champ, if I search over a 15-minute interval. Problem is, I need to search over the past 30 days. How can I rewrite this to eliminate the need for a subsearch, allowing me to get to the results I need? Should I use a 'transaction' to do this or is that out of place too?

Again, Splunk rookie here... it's taken me hours to get this far, even though I expect there is a simple and obvious solution.

index=auth NOT clientip="10.*" [search index="product12" "Completed login" host="DAL*" | rex "(?i) for user (?P<username>[^ ]+)" | dedup username | fields username] | geoip clientip | dedup clientip

woodcock · ‎04-13-2016

Like this:

(index=auth NOT clientip="10." OR (index="product12" "Completed login" host="DAL")
| rex "(?i) for user (?P<usernameFromProduct12>[^ ]+)"
| eval username=coalesce(username, usernameFromProduct12)
| stats dc(index) AS index_count values(clientip) AS clientip BY username
| where dc(index)>1
| dedup clientip | geoip clientip

lpolo · ‎07-15-2011

I have a similar problem. I cannot run a join for time frames larger than ~15 min. No matter how you set up your joins or subsearches Splunk will not be able to present the correct result set if the number of log events per minute is high.

David · ‎03-01-2011

If you have username in both places, I would run this with a join, and that might improve performance. Realistically though, if you need data from both indexes, it's going to take the same (or similar) amount of time, regardless of whether you use a subsearch, join or toss it all into one big search. The subsearch / join only adds a significant amount of time if you're traversing the same data twice, which you're not doing in the above.

I would imagine you're going to get an improvement by switching the dedup and the geoip, though, sending fewer events through the geoip process.

I'm not sure if you'd get improved performance, but you could try it with a join:

index=auth NOT clientip="10.*" 
    | join username [
                search index="product12" "Completed login" host="DAL*" 
                       | rex "(?i) for user (?P<username>[^ ]+)" 
                       | dedup username 
                       | fields username
                     ]
    | dedup clientip
    | geopip clientip

What would likely make a bigger difference for your environment is to put it in a summary index. That way you can batch the reporting for off hours, and just query the much smaller summary index when you are actually looking at your report. Take a look at:

http://www.splunk.com/wiki/Deploy:Summary_Indexing

for more.

Subsearch limits. Better approach?

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor