Splunk Search

Search unique values - from index that has 25+ million lines

vdevarayan
Path Finder

Here is my usecase:
log lines are comma separated and have teamname, location, and other fields

I would like to get a list of unique teamnames.
The problem is that the index is growing and have 25+ million records in it.
So, dedup teamname or stats(values) takes minutes to calculate.
The same operation in mysql takes less than a second.

I am sure there is a way to quickly get the unique values from this index.
What is the best practice for such scenarios?

thanks

Tags (2)
0 Karma
1 Solution

stephane_cyrill
Builder

Hi, to deal with that problem of execution time that is too long to have your result,I think that you can simply save your search and accelerate it so that summary indexing work and you will always have your value available.

View solution in original post

0 Karma

somesoni2
Revered Legend

Here is what I would implement in such scenario

1) Create a lookup table file which will contain the team name. Lets call it lookup_teamname.csv with single field called teamname
2) Create a schedule search which will run every night at 1:00 AM, with earliest=-1d@d and latest=@d to take all the events received yesterday and get unique teamname values. These values will be then populate the lookup table (after removing duplicates)
sample search :

your base search | stats count by teamname | table teamname | append [|inputlookup lookup_teamname.csv | table teamname] | stats count by teamname | table teamname | outputlookup append=f lookup_teamname.csv 

3) Use this lookup to populate your dropdown.

stephane_cyrill
Builder

Yes you can ouput all the values in a csv file a use the csv to populate teamname.
.............|outputcsv fileName

0 Karma

stephane_cyrill
Builder

Hi, to deal with that problem of execution time that is too long to have your result,I think that you can simply save your search and accelerate it so that summary indexing work and you will always have your value available.

0 Karma

vdevarayan
Path Finder

Thanks to all for answering my question on this.

This solution worked the best and not yet played with the lookup appending solution (provided by somesoni2)
It seems splunk is smart to match the exact search string to one in saved report.

Thanks again to all.

0 Karma

ngatchasandra
Builder

Hi,

The way to do this easily is to use dedup teamname or stats values(teamname) by modify the time range picker when you run your search. Or you can accelerate your search.

0 Karma

somesoni2
Revered Legend

You have to find distinct team name for data for all time ?

0 Karma

vdevarayan
Path Finder

Yes - csv data from many servers gets forwarded to the same index.
teamname is a dropdown in the dashboard. To populate the teamname, i am doing a unique/distinct query.

Even if I reduce the duration to 7 days, the record count is like 25-30 million. Doing a unique field search means that every search has to go thru all of them and then calculate.
There must be a better way to do this type of search.
stats(values) is ok - but why doing the same search again and again?
Could we take the values and store it somewhere and use it for dashboard.
Not sure what the best practice in Splunk. Hence, this post.

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...