Splunk Dev

Python SDK Paginate Result Set

michaudel
Explorer

So I have a fairly simple python script i have been working on which gets the results from search and does some work on them. However i am having some trouble paginating through the results so i can pull in more than 50K results.

Following the documentation works where it paginates the result set 10 at a time, but this takes a really long time, even just to iterate through 90K results.

so even though my result count is about 90K the result reader is always giving me 0.

For some reason no matter what i put for a value of the count other than 10 it breaks.

Any thoughts would be great, thanks for your help.

searchPayroll = """ <some search>"""
#returns a job from a service connection which performs the search
job = doSearch(searchPayroll)

 # Page through results by looping through sets of 10 at a time
resultCount = job["resultCount"]  # Number of results this job returned
offset = 0                     # Start at result 0
count = getMaxResults()                # 1 less that the max result count.
trackerpayroll=0                    #track result count
dictResultSetPayRoll = dict()


while (offset < int(resultCount)):
    kwargs_paginate = {"count": count,
                       "offset": offset}

    # Get the search results
    blocksearch_results = job.results(**kwargs_paginate)
    readerResults = results.ResultsReader(blocksearch_results)

    for result in readerResults:
       <do stuff here problem:
        result count (from job["resultCount"]) is 90K
        reader results = 0

    trackerpayroll += 1
    # Increase the offset to get the next set of results
    offset += count

paramagurukarth
Builder

In your implementation, if dosearchjob method internally uses splunk.search.dispatch..

add maxEvents=30000000 to your **kwargs ..

i.e, splunk.search.dispatch(searchquery,sessionKey=sessionkey,hostPath=baseurl,earliestTime=earliestTime,latestTime=latestTime,maxEvents=30000000)

and use the below implementation

searchjob = dosearchjob(quey)
    resultCount  = searchjob.resultCount
    offsetValue = 0
    searchresults = ""
    while offsetValue < resultCount:
        searchresults = searchresults +  str(searchjob.getFeed(mode='results', outputMode='csv',count=49999,offset=offsetValue))
        offsetValue = offsetValue + 49999

Use whatever outputMode you want 🙂

paramagurukarth
Builder

Please provide the code for your doSearch method

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...