Solved: Comparing two data sets from the same timeframe an...

Tisiphone_1 · ‎06-18-2010

I am trying to compare the results of two searches that share a common timeframe and index, with a negation. The common resulting field may occur multiple times in both searches. As an example:

Search 1:

index="stuff" status="LoginSuccessful"

and Search 2 is:

index="stuff" status="LoginFailed"

If username is in an event in search 2 but NOT in an event in search 1, I want to see it. (So, show me all of the users that had a failed login and never had a successful login.)

The only way I have found to do this is to output one set of results to a csv and use lookup to search one. Even if this were two separate indices, I'm not clear on how you would negate the results of an entire join.

I figure there has to be a simpler way to do this...

gkanapathy · ‎06-19-2010

There are several easier ways to do this. An efficient way is:

 index=stuff status=loginsuccessful OR status=loginfailed | stats count(eval(status=="LoginSuccessful")) as c_success, count((eval(status=="LoginFailed"))) as c_failure by user | where c_failure > 0 AND c_success == 0

Similar (but somewhat less amenable to map-reduce treatment than the previous, if you have multiple indexer nodes) is:

 index=stuff status=loginsuccessful OR status=loginfailed | transaction user | where status=="LoginFailed" AND NOT status=="LoginSuccessful"

And the obvious way (which is just a concise version of what you did) is to use a subsearch:

 index=stuff status=loginfailed NOT [ search index=stuff status=loginsuccessful | dedup user | fields user ]

But this is relatively inefficient and is subject to limits on subsearch and format that must be increased in the system if you have a lot of users.

View solution in original post

gkanapathy · ‎06-19-2010

There are several easier ways to do this. An efficient way is:

 index=stuff status=loginsuccessful OR status=loginfailed | stats count(eval(status=="LoginSuccessful")) as c_success, count((eval(status=="LoginFailed"))) as c_failure by user | where c_failure > 0 AND c_success == 0

Similar (but somewhat less amenable to map-reduce treatment than the previous, if you have multiple indexer nodes) is:

 index=stuff status=loginsuccessful OR status=loginfailed | transaction user | where status=="LoginFailed" AND NOT status=="LoginSuccessful"

And the obvious way (which is just a concise version of what you did) is to use a subsearch:

 index=stuff status=loginfailed NOT [ search index=stuff status=loginsuccessful | dedup user | fields user ]

But this is relatively inefficient and is subject to limits on subsearch and format that must be increased in the system if you have a lot of users.

Tisiphone_1 · ‎06-19-2010

Thanks! I had my syntax mixed up on the last. Now it works!

Comparing two data sets from the same timeframe and index?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!