Hi,
I am working on a distributed splunk environment. I have created an app and a separate indexer for this app to load data. I have the data on the data summary, so when I got to search and for example say "Index=abc" , it takes 20 mins to load completely. If I add more complexity to my search, it would take even longer.
I do have huge volumes of data (millions of records ). Is there a way to optimize?
index=abc
feels like a simple query, but it's actually quite an expensive search for Splunk to run. It's definitely not a good candidate if you're looking for a "speed of light" test. A good speed of light test would be this index=abc | stats count
. Paradoxically the added search syntax often allows splunk to do less work.
In index=abc | stats count
, splunk notes that you don't actually need any fields extracted, thus no lookups run, no raw event text, or anything. It can really pare this search down to a bare minimum and do the work on the indexer.
In index=abc
, you're telling Splunk you want to actually see the raw events, so it will run all possible field extractions, calculate the timeline, the summaries of all the fields and their top values, and the search head will have to pull all the raw event text and fields from the indexer to assemble the results locally.
index=abc
feels like a simple query, but it's actually quite an expensive search for Splunk to run. It's definitely not a good candidate if you're looking for a "speed of light" test. A good speed of light test would be this index=abc | stats count
. Paradoxically the added search syntax often allows splunk to do less work.
In index=abc | stats count
, splunk notes that you don't actually need any fields extracted, thus no lookups run, no raw event text, or anything. It can really pare this search down to a bare minimum and do the work on the indexer.
In index=abc
, you're telling Splunk you want to actually see the raw events, so it will run all possible field extractions, calculate the timeline, the summaries of all the fields and their top values, and the search head will have to pull all the raw event text and fields from the indexer to assemble the results locally.
@sideview
You are right... This is definitely better, but I still do feel that this is taking long. I have 42 million records.
Splunk is taking almost 3-4 minutes to return count for the query index=abc| stats count
Is this normal?
If I make a dashboard, will the speed improve?
Are you running the search in Fast Mode? (below the timerange picker, you have a dropdown to select the mode)
That's a pretty good speed (~175,000 events scanned per second), indicating either that you have a lot of indexers, or you have one indexer with a very nice IO subsystem and SSD(s).
Let's take another tour though. You can schedule your search and then the most recent results will load instantly on dashboards. Or you can "accelerate it". Or you can do weird advanced things because you don't need to actually get any fields or use the raw text - | metasearch index=foo | stats count
will be VASTLY faster. | tstats count where index=foo
is another advanced thing you'll never have heard of that may be just a few seconds to return.
The weird thing about index=abc | stats count
is you're using Splunk and you have the power to do any analytics on the fly and transform and mash up the data any way you want, but you're not doing any of it. Sort of like using an ICBM to transport your cat a few doors down. Seems slow but you're not appreciating the fact that your cat is briefly in space!
But forget that advanced stuff, fast though it is. I advise really slowing down and going back to the tutorial maybe from here - http://docs.splunk.com/Documentation/Splunk/6.2.2/SearchTutorial/Aboutthesearchapp Or further ahead to read about scheduling searches, accelerating searches, using summary indexing, etc...
@somesoni2
I was not, but now I am... It is slightly better 🙂
Thanks