Splunk Search

Count unique values per field

topdeck
Explorer

Hello, imagine you have two fields: IP, ACCOUNT

An IP can access any number of ACCOUNT, an ACCOUNT can be accessed by any number of IP.

For each IP, the number of ACCOUNT it accesses.
For each ACCOUNT the number of IP accessed by it.

Potentially easy.

Show number of ACCOUNTS accessed by IP where those ACCOUNT are accessed by more than one IP and the ACCOUNT that IP accesses are accessed by a different IP not accessed by the other ACCOUNTs

Confused? I'd like to find IPs acccessing a lot of accounts where those accounts are also being accesed by more than one IP and the other IPs accessing those accounts are not all the same.

Tags (1)

sideview
SplunkTrust
SplunkTrust

To start simple -

For each IP, the number of ACCOUNT it accesses.

<search terms> | stats dc(ACCOUNT) by IP

likewise,

<search terms> | stats dc(IP) by ACCOUNT

Those are much simpler than what you're asking for obviously.

Here's the best approach I can think of. Breaking down the following search in english, we take the unique combinations of ACCOUNT and IP (using stats). We then pipe these rows through eventStats so that each row will get a 'distinctIPs' field. The distinctIPs value is the number of IP values that that row's ACCOUNT field was accessed by. Then we treat this as a rough weighting, and we just add up the values for each IP. It's kind of a ridiculous field name, but for clarity I've called it "totalDistinctIPsAccessedByAccountsTheyAccessed"

<searchterms> | stats count by ACCOUNT IP | eventstats dc(IP) as distinctIPs by ACCOUNT | stats count sum(distinctIPs) as totalDistinctIPsAccessedByAccountsTheyAccessed by IP | sort - totalDistinctIPsAccessedByAccountsTheyAccessed

In the end you get a list of the top IP addresses that had accessed LOTS of accounts, weighted heavily towards those where the accessed accounts were themselves accessed by a LOT of IP's.

phew. Hopefully I'm close. 😃

sideview
SplunkTrust
SplunkTrust

(Note - it's best to click 'comment on this answer', under my answer, rather than posting a new answer as a comment.. things get very confusing when the order of the answers changes later)

0 Karma

topdeck
Explorer

Thanks Nick, I'll take a stab using your suggestions. I really wish I could do this in something like perl or python but the data set is too large.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...