Splunk Search

Count unique values per field

topdeck
Explorer

Hello, imagine you have two fields: IP, ACCOUNT

An IP can access any number of ACCOUNT, an ACCOUNT can be accessed by any number of IP.

For each IP, the number of ACCOUNT it accesses.
For each ACCOUNT the number of IP accessed by it.

Potentially easy.

Show number of ACCOUNTS accessed by IP where those ACCOUNT are accessed by more than one IP and the ACCOUNT that IP accesses are accessed by a different IP not accessed by the other ACCOUNTs

Confused? I'd like to find IPs acccessing a lot of accounts where those accounts are also being accesed by more than one IP and the other IPs accessing those accounts are not all the same.

Tags (1)

sideview
SplunkTrust
SplunkTrust

To start simple -

For each IP, the number of ACCOUNT it accesses.

<search terms> | stats dc(ACCOUNT) by IP

likewise,

<search terms> | stats dc(IP) by ACCOUNT

Those are much simpler than what you're asking for obviously.

Here's the best approach I can think of. Breaking down the following search in english, we take the unique combinations of ACCOUNT and IP (using stats). We then pipe these rows through eventStats so that each row will get a 'distinctIPs' field. The distinctIPs value is the number of IP values that that row's ACCOUNT field was accessed by. Then we treat this as a rough weighting, and we just add up the values for each IP. It's kind of a ridiculous field name, but for clarity I've called it "totalDistinctIPsAccessedByAccountsTheyAccessed"

<searchterms> | stats count by ACCOUNT IP | eventstats dc(IP) as distinctIPs by ACCOUNT | stats count sum(distinctIPs) as totalDistinctIPsAccessedByAccountsTheyAccessed by IP | sort - totalDistinctIPsAccessedByAccountsTheyAccessed

In the end you get a list of the top IP addresses that had accessed LOTS of accounts, weighted heavily towards those where the accessed accounts were themselves accessed by a LOT of IP's.

phew. Hopefully I'm close. 😃

sideview
SplunkTrust
SplunkTrust

(Note - it's best to click 'comment on this answer', under my answer, rather than posting a new answer as a comment.. things get very confusing when the order of the answers changes later)

0 Karma

topdeck
Explorer

Thanks Nick, I'll take a stab using your suggestions. I really wish I could do this in something like perl or python but the data set is too large.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...