I have a log file where each line has an itemId
and a clusterId
.
When I run the following sort of queries
| stats count(itemId) as clusterSize by clusterId
| sort - clusterSize
vs
| stats list(itemId) AS items BY clusterId
| eval clusterSize=mvcount(items)
| sort -clusterSize
and get different results. I don't know if it's a coincidence but the second command results in largest clusterSizes of exactly 100.
Does anybody have an idea?
Per the Splunk documentation, list()
Returns a list of up to 100 values of the field X as a multivalue entry.
the list command only returns 100 field values. if there are more than 100 values of itemId, this is why there is that problem in the second query.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/SearchReference/CommonStatsFunctions#Supporte...
if you're looking for a total count of itemIds by clusterId, the first query works great, if you want to know how many unique itemIds are in each clusterId, try |stats dc(itemId) as clusterSize by clusterId
Per the Splunk documentation, list()
Returns a list of up to 100 values of the field X as a multivalue entry.
hey
list(X)
Returns a list of up to 100 values of the field X as a multivalue entry. The order of the values reflects the order of input events.
have a look in this official doc http://docs.splunk.com/Documentation/Splunk/7.0.1/SearchReference/Multivaluefunctions#list.28X.29
so your first query output is correct while your second query results in largest clusterSizes of exactly 100 because of its limit (gives wrong output) and that is why there is a mismatch.
let me know if this helps !