Splunk Search

How to get count of unique values by group?

indusbull
Explorer

Hi

I am working on query to retrieve count of unique host IPs by user and country. The country has to be grouped into Total vs Total Non-US. The final result would be something like below -

UserId, Total Unique Hosts, Total Non-US Unique Hosts
user1, 42, 54
user2, 23, 95

So far I have below query which works but its very slow. Is there any better and faster way to achieve desired result ? Thanks

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST | search Country!=US | stats estdc(HOST) as total_non_us by USERID
| join USERID type="left"
    [
       search index=customindex sourcetype=custom src
         | iplocation allfields=true lang=code HOST | search Country=US | stats estdc(HOST) as total_us by USERID
    ] 
| fillnull
| eval total = total_non_us + total_us
0 Karma
1 Solution

elliotproebstel
Champion

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

View solution in original post

elliotproebstel
Champion

This should run more efficiently by avoiding the join command and duplicate searching:

index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us

niketn
Legend

@elliotproebstel, I would perform stats first and then apply iplocation to aggregated fields.

http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Geostats#Usage
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...

index=customindex sourcetype=custom src 
| stats count BY USERID HOST 
| iplocation allfields=true lang=code HOST 
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL) 
| stats dc(us_host) AS total_us, dc(non_us_host) AS total_non_us BY USERID
| addtotals row=t col=f

@indusbull, can you explain why you are trying to use estdc() and not dc()?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

milanpatel78
New Member

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.

0 Karma

indusbull
Explorer

@niketnilay I was using dc initially but since it was taking long time I decided to try estdc since splunk doc mentions that estdc can give better performance.

0 Karma

rushabh92
New Member

Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column.  Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes. 

Tags (2)
0 Karma

elliotproebstel
Champion

Good point, thanks!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...