Splunk Search

Splunk response time is quite slow when I use the lookup script

lpolo
Motivator

Splunk response time is quite slow when I use the lookup script presented below. The response time of the web service we use in the script is quite fast most of the times under 1ms per request. what should I look to improve the performance in splunk?

This is what job inspector says "dispatch.preview" takes most of the time:

Job inspector Results:

Duration (seconds)      Component   Invocations     Input count     Output count
    0.003   command.addinfo     3   9,352   9,352
    0.003   command.fields  3   9,352   9,352
    0.005   command.lookup  1   3,240   3,240
    0.07    command.prestats    3   9,352   3,240
    0.327   command.remotetl    3   9,352   9,352
    0.899   command.search  3   -   9,352
    0.569   command.search.kv   1   -   -
    0.243   command.search.typer    3   9,352   9,352
    0.067   command.search.rawdata  1   -   -
    0.009   command.search.calcfields   1   9,352   9,352
    0.006   command.search.filter   1   -   -
    0.006   command.search.index    3   -   -
    0.002   command.search.tags     3   9,352   9,352
    0.001   command.search.fieldalias   1   9,352   9,352
    0.001   command.search.lookups  1   9,352   9,352
    0.136   command.stats   1   3,240   3,240
    0.048   dispatch.createProviderQueue    1   -   -
    0.169   dispatch.evaluate   1   -   -
    0.091   dispatch.evaluate.lookup    1   -   -
    0.077   dispatch.evaluate.search    1   -   -
    0.001   dispatch.evaluate.stats     1   -   -
    1.293   dispatch.fetch  4   -   -
    **461.84    dispatch.preview    1   -   -**
    0.013   dispatch.process_remote_timeline    1   110,422     9,352
    1.292   dispatch.stream.local   3   -   -
    0.022   dispatch.timeline   4   -   -

Splunk Query:

earliest=-1d@d latest=@d index="si_usage" |stats sum(count) by entity| lookup entityid_lookup entity as entity OUTPUT entity_title as entity_title

Script:

import urllib2, sys,csv,time,logging
from time import clock
from lxml import etree
from time import strftime as date

def lookup(entity):
    try:

        url = "http://fqdn/rex/select/?q="+entity+"&version=2.2&start=0&rows=10&indent=on&fl=title&application=agent.007"

        f = urllib2.urlopen(url)
        doc = etree.XML(f.read())
        entity_title = doc.xpath("//str[@name='title']/text()")
        if not entity_title:
                entity_title = "Entity Title Not Found"
        return entity_title
    except:
        return []

def main():
    logging.basicConfig(filename='lookup.log', level=logging.INFO, format='%(asctime)s %(message)s')
    start_time = time.time()
    logging.info('Script started')
    entity = sys.argv[1]
    entity_title= sys.argv[2]
    r = csv.reader(sys.stdin)
    w = None
    first = True
    for line in r:
        if first:
          header = line
          if entity not in header or entity_title not in header:
                print "entity and entity_title fields must exist in CSV data"
                sys.exit(0)
          csv.writer(sys.stdout).writerow(header)
          w = csv.DictWriter(sys.stdout, header)
          first = False
          continue
        result = {}
        i = 0
        while i < len(header):
            if i < len(line):
                result[header[i]] = line[i]
            else:
                result[header[i]] = ''
            i += 1
        if len(result[entity]) and len(result[entity_title]):
            w.writerow(result)
        elif len(result[entity]):
            resulted = lookup(result[entity])
            result[entity_title]=resulted
            w.writerow(result)
    end_time = time.time() - start_time
    logging.info('Script finished in %s seconds', end_time)

if __name__ == '__main__':
    main()
Tags (3)
0 Karma

dwaddle
SplunkTrust
SplunkTrust

I am skeptical of the performance of your scripted lookup. I don't think that you could set up a TCP session (including DNS lookup time), make the service call, and get a response back in 1ms. Ayn's advice of the search inspector is good. In conjunction with the search job inspector you might (for a test) try running your search as follows:

earliest=-1d@d latest=@d index="si_usage" 
| stats sum(count) by entity 
| head 1 
| lookup entityid_lookup entity as entity OUTPUT entity_title as entity_title

This is to check to see what your minimum cost is for the lookup. Then, increase the number of items being looked up and see how your runtime scales.

lpolo
Motivator

If I run the script by itself the avg. rt is always around 50ms. If I use it as lookup script the rt is around 77ms. Splunk is adding ~27ms. I need the response time to be ~60ms. Using head:
head 1:
2012-08-28 17:23:09 Script finished in 0.047 sec
head 2:
2012-08-28 17:23:47 Script finished in 0.084 sec
head 3:
2012-08-28 17:24:07 Script finished in 0.122 sec
head 4:
2012-08-28 17:24:30 Script finished in 0.185 sec
head 5:
2012-08-28 17:33:13 Script finished in 0.392 sec
head 10:
2012-08-28 17:56:11 Script finished in 0.815 sec
head 100:
2012-08-28 17:54:12 Script finished in 7.470 sec

0 Karma

jrodman
Splunk Employee
Splunk Employee

To determine the cost of the script, it will be necessary to get the time at the beginning and the end of the script, for each run, and log all these times. Once you have measured the cumulative time of each and every invocation of the script from a single search run, you'll have a better idea if the cost is elsewhere.

0 Karma

lpolo
Motivator

I will be skeptical too I was not explicit enough. Let me elaborate based on your recommendation:
Job Inspector reported:

This search has completed and has returned 1 result by scanning 9,352 events in 2.044 seconds.
Out 2.044 seconds dispatch.preview took 1.275 second.

The response time of server that gets REST request was QTime=1ms . The lookup time to resolve the FQDN was :

real 0m0.005s
user 0m0.000s
sys 0m0.005s

If you add QTime + the lookup still we have less than 10ms. why is dispatch.preview taking so much time?
What do you suggest?

Thanks,
Lp

0 Karma

Ayn
Legend

Splunkweb includes an excellent utility for debugging your searches in terms of exactly what in the search took how much time. It's available in the Jobs list under the "Inspect" link.

Ayn
Legend

I'm as skeptical as dwaddle is regarding the 1ms response time, so I'm thinking your lookup needs some optimization for things to speed up.

0 Karma

lpolo
Motivator

The dispatch.preview is the offender. According to Splunk documentation: It is the time spent generating preview results.

Job inspector reported the following result set for on day sample:

This search has completed and has returned 3,240 results by scanning 9,352 events in 466.573 seconds.

Out of this 466.573 seconds dispatch.preview took 465.081 seconds.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...