Splunk Dev

Python SDK Paginate Result Set

michaudel
Explorer

So I have a fairly simple python script i have been working on which gets the results from search and does some work on them. However i am having some trouble paginating through the results so i can pull in more than 50K results.

Following the documentation works where it paginates the result set 10 at a time, but this takes a really long time, even just to iterate through 90K results.

so even though my result count is about 90K the result reader is always giving me 0.

For some reason no matter what i put for a value of the count other than 10 it breaks.

Any thoughts would be great, thanks for your help.

searchPayroll = """ <some search>"""
#returns a job from a service connection which performs the search
job = doSearch(searchPayroll)

 # Page through results by looping through sets of 10 at a time
resultCount = job["resultCount"]  # Number of results this job returned
offset = 0                     # Start at result 0
count = getMaxResults()                # 1 less that the max result count.
trackerpayroll=0                    #track result count
dictResultSetPayRoll = dict()


while (offset < int(resultCount)):
    kwargs_paginate = {"count": count,
                       "offset": offset}

    # Get the search results
    blocksearch_results = job.results(**kwargs_paginate)
    readerResults = results.ResultsReader(blocksearch_results)

    for result in readerResults:
       <do stuff here problem:
        result count (from job["resultCount"]) is 90K
        reader results = 0

    trackerpayroll += 1
    # Increase the offset to get the next set of results
    offset += count

paramagurukarth
Builder

In your implementation, if dosearchjob method internally uses splunk.search.dispatch..

add maxEvents=30000000 to your **kwargs ..

i.e, splunk.search.dispatch(searchquery,sessionKey=sessionkey,hostPath=baseurl,earliestTime=earliestTime,latestTime=latestTime,maxEvents=30000000)

and use the below implementation

searchjob = dosearchjob(quey)
    resultCount  = searchjob.resultCount
    offsetValue = 0
    searchresults = ""
    while offsetValue < resultCount:
        searchresults = searchresults +  str(searchjob.getFeed(mode='results', outputMode='csv',count=49999,offset=offsetValue))
        offsetValue = offsetValue + 49999

Use whatever outputMode you want 🙂

paramagurukarth
Builder

Please provide the code for your doSearch method

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...