Splunk Search

Field extraction efficiency improvements

Peter
Path Finder

I am currently extracting 3 fields at index-time based on a custom eventtype. I did this a while ago and realize that this is probably less than efficient. Searches on this data are rather slow. I have the capability to remove the reliance on the custom eventtype and switch to a new sourcetype dedicated to these events. I was also planning on moving from index-time field extractions to search-time field extractions based on the new sourcetype.

Does this sound like it will yield a significant search speed improvement? Is there something else I can do to improve the efficiency of these data searches?

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

I'm not really sure what you're actually doing.

In version 4.0, you can't extract fields based on eventtypes in 4.0 as far as I know. Are you on 3.x?

Furthermore (regardless of version), it's impossible to perform index-time extractions based on an eventtype. eventtype is only ever evaluated at search time.

It's unlikely though (assuming you're on 3.x and you're actually doing search-time extractions) that changing to a sourcetype will yield a great improvement in search, but this depends greatly on the specific eventtype definition with respect to your indexed data.

I will also mention that most index-time field extractions have little or no performance advantage over search-time ones, though there are some exceptions, but I'm confused by your description of what you are doing, so am not sure if this is at all applicable.

View solution in original post

Lowell
Super Champion

Certainly classifying your events at indexed time using a new sourcetype, rather than with an eventtype at search time has the potential to improve your search performance. This, of course, depends on how efficient your existing eventtype declaration is, which is determined by the search terms and how effectively they can be used to select the desired events and filter out the irrelevant ones. The primary advantage of a sourcetype in this scenario, would be that sourcetype is an indexed field and can therefore limit down your results faster than multiple terms in your eventtype. But again, you probably will not notice a difference unless you have a really complex eventtype definition, in which case, it's probably not easy to filter these out into a separate sourcetype in the first place. Of course, there are other potential advantages to using a separate sourcetype, but that's a different question.

I am curious about your indexed fields. You state that they are based on your eventtype, but I assume you mean either:

  1. You have fields that are indexed based on some source/souretype/host value (setup in props.conf), and that they events are matched at runtime using your eventtype.
  2. You are actually using search-time field extractions, when you think they are indexed fields.

Using indexed fields should generally not be slower than extracted-fields. The reason why indexed fields are not generally recommended is because they are difficult to setup and maintain and they count against your license quota. But rarely (if ever) should they be slower than search-time fields. (You may want to double check that you have them setup properly in the fields.conf file. One quick and dirty test is to try search for my_indexed_field::value instead of my_indexed_field=value, this forces search to lookup my_indexed_field as an indexed field, rather than by using a search-time extracted fields. I actually end up using this trick frequently since I have some field names that are indexed some places, and extracted other places. For the record, I recommend avoiding this type of confusion as much as possible.)

I would recommend that you start by reviewing you existing eventtype definition, and make sure there isn't a more efficient search you could use. The search should be faster if you aren't relying on extracted fields in your eventtype. Try to stick with only indexed terms. Avoid leading *s. If possible, use punct, this can make a huge difference. Play around with different combinations of search terms an see if you can find something that works better.

I'm also curious if your search is slow when you search on your fields, or anytime you search against your events in question? Are your other searches slow, or just these events? What kind of timeframe are you searching?

Peter
Path Finder

Wow - quite a complete answer - THANKS! I did the quick field::value test and it would appear that this is a search time extraction. What I had to do in 3.x to get this working was a messy combination of props/transforms/fields/eventtypes. I knew that eventtypes are search-time, but still can't understand why props sets the DEST_KEY=_meta and fields sets INDEXED=true. You and gkanapathy have confirmed that what I am doing now is a mess that I will be glad to remove. I appreciate your response.

0 Karma

hylam
Contributor

Does double colon apply to splunk 6?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I'm not really sure what you're actually doing.

In version 4.0, you can't extract fields based on eventtypes in 4.0 as far as I know. Are you on 3.x?

Furthermore (regardless of version), it's impossible to perform index-time extractions based on an eventtype. eventtype is only ever evaluated at search time.

It's unlikely though (assuming you're on 3.x and you're actually doing search-time extractions) that changing to a sourcetype will yield a great improvement in search, but this depends greatly on the specific eventtype definition with respect to your indexed data.

I will also mention that most index-time field extractions have little or no performance advantage over search-time ones, though there are some exceptions, but I'm confused by your description of what you are doing, so am not sure if this is at all applicable.

Peter
Path Finder

I can understand your confusion. I originally set this up on 3.x, but am now on 4. I noticed that I couldn't do a field extraction based on an eventtype with 4, which led me to dig deeper into the work I had done previously and prompted me to ask this question. I guess I am looking for an opinion on the most efficient way to extract a field and you've generally answered that with your last paragraph. I will remove the reliance on the eventtype and switch to sourcetype with a search-time extraction.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...