I have a Python script to run nightly and extract data using Splunk REST API. Here is the code:
kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}
searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
f=open('myresults.csv', 'w')
f.write(oneshotsearch_results.read())
The resultset seem to have a limit of 100 records. Is there anyway to set it to unlimited? I don't see anything related to that on http://docs.splunk.com/Documentation/PythonSDK/1.2.2/client.html
If not, how else I can make sure I retrieve the entire result set?
Thanks
You have to create a $SPLUNK_HOME/etc/system/local/limits.conf
file, add the stanza:
[restapi]
maxresultrows = 4294967295
Furthermore you have to add 0
to your sort
search command:
query = """
search source=xyz event="watch" |
table _time event |
sort 0 - _time
"""
and run in your Python code:
service.jobs.oneshot(query, count=0)
If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK
you can read for job.oneshot()
that
The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())
So job.oneshot()
is a job.create()
followed by a job.results()
(almost). So it can take the arguments of create()
:
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams
and the arguments of results()
:
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...
Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf
Note that I specified 2^32 - 1 in maxresultrows
because if you run this code on a 32 bit machine it hangs:
job = splunk_connection.jobs.create(search, max_count=2**32)
This is probably caused by a C for loop.
From sort
documentation:
sort <count>+ [desc]
<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned
http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort
You have to create a $SPLUNK_HOME/etc/system/local/limits.conf
file, add the stanza:
[restapi]
maxresultrows = 4294967295
Furthermore you have to add 0
to your sort
search command:
query = """
search source=xyz event="watch" |
table _time event |
sort 0 - _time
"""
and run in your Python code:
service.jobs.oneshot(query, count=0)
If you dig into the minimal and cryptic documentation:
http://docs.splunk.com/Documentation/PythonSDK
you can read for job.oneshot()
that
The oneshot method makes a single
roundtrip to the server (as opposed to
two for create() followed by
results())
So job.oneshot()
is a job.create()
followed by a job.results()
(almost). So it can take the arguments of create()
:
http://dev.splunk.com/view/SP-CAAAEE5#searchjobparams
and the arguments of results()
:
http://docs.splunk.com/Documentation/Splunk/6.2.2/RESTREF/RESTsearch#search.2Fjobs.2F.7Bsearch_id.7D...
Since Python SDK is a py wrapper around the REST API, you have also to specify an higher limit for it in limits.conf:
http://docs.splunk.com/Documentation/Splunk/6.2.3/Admin/Limitsconf
Note that I specified 2^32 - 1 in maxresultrows
because if you run this code on a 32 bit machine it hangs:
job = splunk_connection.jobs.create(search, max_count=2**32)
This is probably caused by a C for loop.
From sort
documentation:
sort <count>+ [desc]
<count>
Syntax: <int>
Description: Specify the number of results to sort. If no count is
specified, the default limit of 10000
is used. If "0" is specified, all
results will be returned
http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Sort
Really appreciated the depth and detail of this answer. It got our local dev environment working right searching and returning in minutes.
Are there any ideas on if the API consumer doesn't have the ability to change the Splunk instance's maxresultsrow? The client/consumer we are building will deploy separate to customers who have Splunk and we won't have authority to make that change, just advise that it should be made.
did you find a solution? Great if you can share it.
Thanks
as I mentioned, I changed my code to using blocking and pagination. The problem with stopping at 10000 was my oversight to forgetting to include 0 in the sort command. Adding 0 to sort command and looping took care of getting all the results back from the search command.
I changed the python script to do blocking, using pagination example. it goes through the loop and extracts 100 (my count size for testing it), but it still stops when the offset is 10000! How can I make it receive 100s of 1000s of events?
I know adding 'count':0 lets the resultset to return 10000 entries. However, I am lookign to export about 400000 records (or least 100000 entries on nightly basis). What is the best way to do that?
Have you looked at the limits.conf spec? Seems to me you'll be hitting one if not many output limits here. Even if you adjust your limits.conf to allow more output, you'll still hit a ceiling, most certainly on sub searches.
sorry, codes lines in a readable format:
kwargs_oneshot = {'latest_time': '2014-10-23T10:00:00.000', 'earliest_time': '2014-10-23T08:00:00.000', 'output_mode': 'csv'}
searchquery_oneshot = 'search source=xyz event="watch" | table _time, event | sort - _time '
oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot)
f=open('myresults.csv', 'w')
f.write(oneshotsearch_results.read())