Splunk Dev

Python SDK Paginate Result Set

michaudel
Explorer

So I have a fairly simple python script i have been working on which gets the results from search and does some work on them. However i am having some trouble paginating through the results so i can pull in more than 50K results.

Following the documentation works where it paginates the result set 10 at a time, but this takes a really long time, even just to iterate through 90K results.

so even though my result count is about 90K the result reader is always giving me 0.

For some reason no matter what i put for a value of the count other than 10 it breaks.

Any thoughts would be great, thanks for your help.

searchPayroll = """ <some search>"""
#returns a job from a service connection which performs the search
job = doSearch(searchPayroll)

 # Page through results by looping through sets of 10 at a time
resultCount = job["resultCount"]  # Number of results this job returned
offset = 0                     # Start at result 0
count = getMaxResults()                # 1 less that the max result count.
trackerpayroll=0                    #track result count
dictResultSetPayRoll = dict()


while (offset < int(resultCount)):
    kwargs_paginate = {"count": count,
                       "offset": offset}

    # Get the search results
    blocksearch_results = job.results(**kwargs_paginate)
    readerResults = results.ResultsReader(blocksearch_results)

    for result in readerResults:
       <do stuff here problem:
        result count (from job["resultCount"]) is 90K
        reader results = 0

    trackerpayroll += 1
    # Increase the offset to get the next set of results
    offset += count

paramagurukarth
Builder

In your implementation, if dosearchjob method internally uses splunk.search.dispatch..

add maxEvents=30000000 to your **kwargs ..

i.e, splunk.search.dispatch(searchquery,sessionKey=sessionkey,hostPath=baseurl,earliestTime=earliestTime,latestTime=latestTime,maxEvents=30000000)

and use the below implementation

searchjob = dosearchjob(quey)
    resultCount  = searchjob.resultCount
    offsetValue = 0
    searchresults = ""
    while offsetValue < resultCount:
        searchresults = searchresults +  str(searchjob.getFeed(mode='results', outputMode='csv',count=49999,offset=offsetValue))
        offsetValue = offsetValue + 49999

Use whatever outputMode you want 🙂

paramagurukarth
Builder

Please provide the code for your doSearch method

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...