Adapting from this solution: http://answers.splunk.com/answers/124848/python-sdk-paginate-result-set.html#answer-227017 (thanks @paramagurukarthikeyan for the pointer and the answer), the following seems to work:
import sys
import io
import splunklib.results as results
import splunklib.client as client
service = client.connect(host=HOST,port=PORT,username=USERNAME,password=PASSWORD)
job = service.jobs.create(search, **{"exec_mode": "blocking",
"earliest_time": start_time,
"latest_time": end_time,
"output_mode": "xml",
"maxEvents": 30000000})
resultCount = int(job["resultCount"])
offset = 0; # Start at result 0
count = 50000; # Get sets of count results at a time
thru_counter = 0
while (offset < resultCount):
kwargs_paginate = {"count": count, "offset": offset}
# Get the search results and display them
rs = job.results(**kwargs_paginate)
reader = results.ResultsReader(io.BufferedReader(rs))
wrt = sys.stdout.write
for ix, item in enumerate(reader):
if not (thru_counter % 50000): # print only one in 50000 results as sanity check
line = ""
for val in item.itervalues():
line += val + ","
wrt(line[:-1] + "\n")
thru_counter += 1
# Increase the offset to get the next set of results
offset += count
There is a remaining issue, that the parsing is relatively slow (I am getting ~1300 rows/sec, where each row is 100 bytes, i.e. ~130 kbps). The reason is hinted at in the answer of @ineeman on March 10 2014 in this question http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html
I am posting a separate question to see if I can improve the speed of fetching the query results.
... View more