Splunk Search

Large Table Lookup at Index time vs. Search time - Tradeoffs

beaumaris
Communicator

I have a rather large .csv file (500K rows) gathered from an external source that is used to do lookups in summarization queries. It functions correctly but I was wondering what the tradeoffs would be if the lookup was moved to props.conf as a LOOKUP- so the lookups would occur at the time the log files were indexed. This feels like it may have a performance impact as we are processing >20K log entries per minute into a number of regexes as well. Also, doing the lookup at index time allows me to reference the fields directly on the Search Head too.

Currently in most of the summarization searches, the lookup is done in the front of the query so it will wind up sampling all the events anyway. I don't have many searches where the lookup could be moved to the end of the query in an effort to do the lookup against fewer data points.

How does splunk access the .csv table file? Is any or all of it cached? Can splunk be instructed to load the table into a db? I seem to recall that it is read from disk for each lookup request.

Please let me know how well the table lookups scale and if you see any problem doing the lookups at index time.

Tags (2)

gkanapathy
Splunk Employee
Splunk Employee

Hmm, actually there is a misconception. Configuring a lookup as automatic with a LOOKUP directive in props.conf does not do anything at index time, but runs at search time, exactly like using a LOOKUP search command. The behavioral difference is that when you have an automatic LOOKUP, you can search for one of the output fields, and Splunk will reverse-map that value and search for the corresponding input field values transparently. Thus it might appear that the LOOKUP value was burned into the indexed data, but it's not.

To address your other concerns, lookups are very efficient, and does not do a request for each value. Tables are indexed if necessary, input values are reduplicated and only queried in batches of something like 50,000 unique input value sets. Results for scripted lookups are not cached though. Half a million rows is not considered a particularly large LOOKUP file for Splunk. A few million rows would start to be considered large.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...