Here is my usecase:
log lines are comma separated and have teamname, location, and other fields
I would like to get a list of unique teamnames.
The problem is that the index is growing and have 25+ million records in it.
So, dedup teamname or stats(values) takes minutes to calculate.
The same operation in mysql takes less than a second.
I am sure there is a way to quickly get the unique values from this index.
What is the best practice for such scenarios?
thanks
Hi, to deal with that problem of execution time that is too long to have your result,I think that you can simply save your search and accelerate it so that summary indexing work and you will always have your value available.
Here is what I would implement in such scenario
1) Create a lookup table file which will contain the team name. Lets call it lookup_teamname.csv with single field called teamname
2) Create a schedule search which will run every night at 1:00 AM, with earliest=-1d@d and latest=@d to take all the events received yesterday and get unique teamname values. These values will be then populate the lookup table (after removing duplicates)
sample search :
your base search | stats count by teamname | table teamname | append [|inputlookup lookup_teamname.csv | table teamname] | stats count by teamname | table teamname | outputlookup append=f lookup_teamname.csv
3) Use this lookup to populate your dropdown.
Yes you can ouput all the values in a csv file a use the csv to populate teamname.
.............|outputcsv fileName
Hi, to deal with that problem of execution time that is too long to have your result,I think that you can simply save your search and accelerate it so that summary indexing work and you will always have your value available.
Thanks to all for answering my question on this.
This solution worked the best and not yet played with the lookup appending solution (provided by somesoni2)
It seems splunk is smart to match the exact search string to one in saved report.
Thanks again to all.
Hi,
The way to do this easily is to use dedup teamname or stats values(teamname) by modify the time range picker when you run your search. Or you can accelerate your search.
You have to find distinct team name for data for all time ?
Yes - csv data from many servers gets forwarded to the same index.
teamname is a dropdown in the dashboard. To populate the teamname, i am doing a unique/distinct query.
Even if I reduce the duration to 7 days, the record count is like 25-30 million. Doing a unique field search means that every search has to go thru all of them and then calculate.
There must be a better way to do this type of search.
stats(values) is ok - but why doing the same search again and again?
Could we take the values and store it somewhere and use it for dashboard.
Not sure what the best practice in Splunk. Hence, this post.