I have a log file of the following sort:
vendor productId clusterId
A 1 1
B 2 1
A 3 1
C 4 4
D 8 8
D 9 8
D 10 10
Now I would like to select those vendors who have a least one productId which is contained in a cluster of size at least k. The cluster size corresponds to the number of rows with the same clusterId.
So, in the example above, companies A and B both appear in a cluster of size 3, D in a cluster of size 2 and company C in a cluster of size 1.
In SQL I would solve this using sub-queries, but I am not sure how to tackle this in splunk.
Try this (adjust where clause per your need)
your current search giving above output with fields vendor productId clusterId
| eventstats count as clusterSize by clusterId
| where clusterSize>k | stats count as productsCount by vendor
how do you determine cluster size?I mean what is the logic to determine cluster size for the input you have given?
The cluster size corresponds to the number of rows with the same clusterId.
I did not understand A and B both appear in a cluster of size 3, D in a cluster of size 2 and company C in a cluster of size 1.
I mean this does not match with no of rows with same cluster id . Kindly explain in detail.also put the corresponding size in a table..like at each row what will be the size