Splunk Search

How to optimize performance for scripted lookups?

Justin_Grant
Contributor

I would like to use a lookup into an external database to add fields to my events, but need some advice about performance and caching of expensive lookups.

For example, say I have a log of orders made by users on my website. For each order I'd like to record the employee ID of the last customer support person who contacted that customer before purchase. This info is stored in an external database I can only query one record at a time (send customer ID, get back ID of support rep). The problem is that this system has no memory-- once a customer talks to another support rep, I can't get a record of the previous rep-to-customer interaction.

This means that we should be capturing the mapping as close as possible to when the initial data is logged.

Also, that external database is pretty slow (1 query = 0.5 seconds) so even if I could solve the history problem, I can't realistically do a lookup directly into that system every time I want to report on this data.

Finally, there will be millions of orders I am reporting over (each one with a potentially unique mapping to a support rep ID).

I know about external lookup scripts, but I'm not sure if that's a good match for this use case, both for performance reasons and because the mapping changes over time.

Anyone have advice about how to handle this case?

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

A lookup script is not appropriate because it runs at the time of search. If the underlying data is changing and you need the state from some time in the past, then you can't use it in that way, regardless of performance.

What you essentially need is some way of capturing the state of the DB at the time of the logging (or shortly after that time).

The basic problem is, you want information that you are not recording but instead just throwing away. (I don't even know why that database exists. Is it for real that it's that badly designed, or is it that you're simplifying and making it seem so?) The only solution to this is to record that information. The "right" way to do this in my mind is to put the information into the order information log in the first place. I suppose you could do this with a scripted input.

We can try to come up with other solutions, but in the end they all amount to getting and storing that mapping, and the only difference is where it is stored. In my "right" way, it is stored directly in the order record. If it's impossible to make the changes to the source log, and Splunk is our only tool to make this work, we could also try to record every single change to a row in the database, and then correlate back by time and order id at the time of search. Or we could have a scripted input that watched incoming Splunk logs of the orders and more selectively did lookups, possibly in batches. This amounts to using Splunk to completely replace that database, both by actually storing the data and by not being slow.

If the orders come in at a high rate, and the DB takes a half second to respond to each query, any of these solutions is going to cause problems when we try to capture DB state at index time. So in the course of trying to use Splunk to replace the (incomplete and slow) database we will be trapped in the bottleneck of having to be getting every state change out of said slow database.

I don't consider this a "wrong" solution because it's just overengineering a complex and fragil workaround to a problem that should be fixed at its source.

Fundamentally, this application is broken around this database, and Splunk is not really the right solution.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

A lookup script is not appropriate because it runs at the time of search. If the underlying data is changing and you need the state from some time in the past, then you can't use it in that way, regardless of performance.

What you essentially need is some way of capturing the state of the DB at the time of the logging (or shortly after that time).

The basic problem is, you want information that you are not recording but instead just throwing away. (I don't even know why that database exists. Is it for real that it's that badly designed, or is it that you're simplifying and making it seem so?) The only solution to this is to record that information. The "right" way to do this in my mind is to put the information into the order information log in the first place. I suppose you could do this with a scripted input.

We can try to come up with other solutions, but in the end they all amount to getting and storing that mapping, and the only difference is where it is stored. In my "right" way, it is stored directly in the order record. If it's impossible to make the changes to the source log, and Splunk is our only tool to make this work, we could also try to record every single change to a row in the database, and then correlate back by time and order id at the time of search. Or we could have a scripted input that watched incoming Splunk logs of the orders and more selectively did lookups, possibly in batches. This amounts to using Splunk to completely replace that database, both by actually storing the data and by not being slow.

If the orders come in at a high rate, and the DB takes a half second to respond to each query, any of these solutions is going to cause problems when we try to capture DB state at index time. So in the course of trying to use Splunk to replace the (incomplete and slow) database we will be trapped in the bottleneck of having to be getting every state change out of said slow database.

I don't consider this a "wrong" solution because it's just overengineering a complex and fragil workaround to a problem that should be fixed at its source.

Fundamentally, this application is broken around this database, and Splunk is not really the right solution.

gkanapathy
Splunk Employee
Splunk Employee

well, slow is one thing. the real problem here is non-existent data.

0 Karma

Justin_Grant
Contributor

re: "is it that you're simplifying and making it seem so". Yep, I'm simplifying. But the general use-case here (dealing with slow lookups) is pretty common, where you want to annotate logged data with other data (e.g. reverse DNS lookup, search engine results, etc.) which take a few hundred milliseconds to return. In most cases, throughput shouldn't be a problem since these kinds of data sources usually are OK wtih making many calls in parallel-- as long as you don't end up having only one thread doing the querying.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...