Hello,
I have a search I'm trying to speed up. I have a list of field values stored in a KV store. I use an inputlookup subsearch to feed those field values into a main search for all events that match any of those fields. Like this:
index=myevents [ | inputlookup MyKV | table MyFieldValues | fields MyFieldValues]
This search returns the events I want, but I'm looking for this search to complete more quickly. Time-based filtering won't work (the events span all-time). Other ways of trying to pare down the results before running the subsearch also provided no significant speed increases.
I started looking into data models with accelerations to see if I could get better performance. But I'm seeing markedly worse performance when I create an accelerated data model and search it using "| datamodel mymodel search | search [ | inputlookup MyKV | table MyFieldValues | fields MyFieldValues]. Maybe I'm doing something wrong with how I search, but I couldn't find documentation to suggest that's the case.
Is there any way to speed up performance of my search. Or is Splunk already by default optimized in a way I can't really improve upon here?
Thanks for any help you can provide.
Since in my case my KV store is attempting to reference specific events for lookup later, I wanted to see if searching for events using only a fairly unique, non-string field like _time would improve performance. I tried both using Where In Syntax (with a list of times to match) and an OR list (i.e. _time=1 OR _time=2 ...). Both perform markedly worse (two orders of magnitude) than the subsearch returning the field values to match. I wanted to also attempt to return _time back from a subsearch to see if that made a difference, but returning the internal _time field from the subsearch resulted in no results being found (presumably because it isn't actually returned as an internal field).
I actually found this behavior rather unintuitive. I assumed that since integer comparisons can be performed faster than string comparisons and because Splunk seems to be optimized already for time based comparisons (with its time window searches), I would see better results. But it seems like that is a dead-end unless there is some more performant way to structure my search query based on a list of times.