Hi
I am working on query to retrieve count of unique host IPs by user and country. The country has to be grouped into Total vs Total Non-US. The final result would be something like below -
UserId, Total Unique Hosts, Total Non-US Unique Hosts
user1, 42, 54
user2, 23, 95
So far I have below query which works but its very slow. Is there any better and faster way to achieve desired result ? Thanks
index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST | search Country!=US | stats estdc(HOST) as total_non_us by USERID
| join USERID type="left"
[
search index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST | search Country=US | stats estdc(HOST) as total_us by USERID
]
| fillnull
| eval total = total_non_us + total_us
This should run more efficiently by avoiding the join
command and duplicate searching:
index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us
This should run more efficiently by avoiding the join
command and duplicate searching:
index=customindex sourcetype=custom src
| iplocation allfields=true lang=code HOST
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats estdc(us_host) AS total_us, estdc(non_us_host) AS total_non_us BY USERID
| fillnull
| eval total = total_non_us + total_us
@elliotproebstel, I would perform stats first and then apply iplocation to aggregated fields.
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Geostats#Usage
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Lookup#Optimizing_your_lookup_sea...
index=customindex sourcetype=custom src
| stats count BY USERID HOST
| iplocation allfields=true lang=code HOST
| eval us_host=if(Country="US", HOST, NULL), non_us_host=if(Country!="US", HOST, NULL)
| stats dc(us_host) AS total_us, dc(non_us_host) AS total_non_us BY USERID
| addtotals row=t col=f
@indusbull, can you explain why you are trying to use estdc()
and not dc()
?
Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.
@niketnilay I was using dc initially but since it was taking long time I decided to try estdc since splunk doc mentions that estdc can give better performance.
Pandas nunique() is used to get a count of unique values. It returns the Number of pandas unique values in a column. Pandas DataFrame groupby() method is used to split data of a particular dataset into groups based on some criteria. The groupby() function split the data on any of the axes.
Good point, thanks!