Solved: Unique KeyValue search performance

splunkrg · ‎01-25-2014

Hey Everyone,

I'm having a bit of trouble with Splunk search performance, I currently have around 1 million rows of logs, each row approx 1kb wide that conforms to the following pattern:

SomeKey1="stringdata" SomeKey2="stringdata" SomeKey3="stringdata" KeyID="UniqueNumericID"

When I do a search on this data using a simple search query such as:

search sourcetype=sourcetypeid KeyID="1"

It takes up to 20-30secs to return the single matching event on a dedicated server (quad core xeon, 16gb ram, SATA3 SSD) using either the GUI or via the REST API. After inspecting many similar queries jobs, the largest consumer of time seems to be dispatch.fetch / dispatch.stream.local, when you take into account that I need to do this similar queries very often and programmatically, I assume the best thing to do would be extract the KeyID field at index time, would this drastically improve the search speed? Are there any other pitfalls that I may have missed?

Thanks in advance..

Ayn · ‎01-26-2014

As martin_mueller says, it's important to know here how unique the KeyID values are - that is, not only in this specific sourcetype, but across all data in the index.

@dwaddle has explained very well the specifics of what goes on in a Splunk search here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...
It's a very good read and I think it answers your question. Short version here: KeyID="1" will be slow because "1" is very likely such a common token in your index, and as most fields aren't extracted until at at search-time, when you search for KeyID="1" Splunk will in practice find all events with the token "1" in them and THEN see if any of these tokens can be matched to the field "KeyID". In this scenario an index-time field extraction might be a good idea in order to improve performance.

View solution in original post

Ayn · ‎01-26-2014

As martin_mueller says, it's important to know here how unique the KeyID values are - that is, not only in this specific sourcetype, but across all data in the index.

@dwaddle has explained very well the specifics of what goes on in a Splunk search here: http://answers.splunk.com/answers/54207/slow-search-when-evaluating-a-numeric-value?page=1&focusedAn...
It's a very good read and I think it answers your question. Short version here: KeyID="1" will be slow because "1" is very likely such a common token in your index, and as most fields aren't extracted until at at search-time, when you search for KeyID="1" Splunk will in practice find all events with the token "1" in them and THEN see if any of these tokens can be matched to the field "KeyID". In this scenario an index-time field extraction might be a good idea in order to improve performance.

splunkrg · ‎01-27-2014

Thanks for that, interesting read. I have since set up the index-time field extraction after a fair amount of pain and running the following command now takes between 100-200ms, what a difference!

search sourcetype=sourcetypeid KeyID::1

martin_mueller · ‎01-26-2014

Are you actually looking for a value "1" or is that just an example?

If you are, Splunk is first loading all events containing "1" and then matching them against the field you were looking for - that's not very efficient, because I assume there are many events containing "1" where KeyID isn't "1".

Unique KeyValue search performance

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life