Hi,
We are using the below python script to get the results from Splunk but the problem is that through UI we are getting more than 6lakh records. However, through API we are getting only 50000 records.
Please help - what do I need to add in below script to get all the records?
import urllib
import httplib2
import time
import re
from time import localtime,strftime
from xml.dom import minidom
import json
baseurl = 'https://localhost:8089'
username = ''
password = ''
myhttp = httplib2.Http()
#Step 1: Get a session key
servercontent = myhttp.request(baseurl + '/services/auth/login', 'POST',
headers={}, body=urllib.urlencode({'username':username, 'password':password}))[1]
sessionkey = minidom.parseString(servercontent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue
print "====>sessionkey: %s <====" % sessionkey
#Step 2: Create a search job
searchquery = 'index="_internal" | head 10'
if not searchquery.startswith('search'):
searchquery = 'search ' + searchquery
searchjob = myhttp.request(baseurl + '/services/search/jobs','POST',
headers={'Authorization': 'Splunk %s' % sessionkey},body=urllib.urlencode({'search': searchquery}))[1]
sid = minidom.parseString(searchjob).getElementsByTagName('sid')[0].childNodes[0].nodeValue
print "====>sid: %s <====" % sid
#Step 3: Get the search status
myhttp.add_credentials(username, password)
servicessearchstatusstr = '/services/search/jobs/%s/' % sid
isnotdone = True
while isnotdone:
searchstatus = myhttp.request(baseurl + servicessearchstatusstr, 'GET')[1]
isdonestatus = re.compile('isDone">(0|1)')
isdonestatus = isdonestatus.search(searchstatus).groups()[0]
if (isdonestatus == '1'):
isnotdone = False
print "====>search status: %s <====" % isdonestatus
#Step 4: Get the search results
services_search_results_str = '/services/search/jobs/%s/results?output_mode=json&count=0' % sid
searchresults = myhttp.request(baseurl + services_search_results_str, 'GET')[1]
print "====>search result: [%s] <====" % searchresults
You're hitting a default search limit. You can increase this value within limits.conf
[searchresults]
maxresultrows = 50000
And/or:
[restapi]
maxresultrows = 50000
You'll need to cycle Splunk after making the config change.
Generally speaking, when you see nice round numbers like 50000, then you're encountering a limitation/parameter within limits.conf
Hi @codebuilder
Thank you for sharing the answer,I was thinking to add the loop in my code to check the value for count and offset and based on that fetch the output.I am not sure how to implement that in my code.Can you please help me with that?
I think it would be easier and more reliable if you instead narrow your search. Either by excluding data or narrowing the date range. It will perform much faster, so you can iterate through a call to that search in order to retrieve all the results you are seeking. It will be more simple to read and maintain, and will perform much better.
(forgot to mention)
Also consider using a accelerated datamodel, your scenario sounds like a perfect candidate.
Hi,
For large dataset export, please use jobs/export
endpoint https://docs.splunk.com/Documentation/Splunk/7.2.6/RESTREF/RESTsearch#search.2Fjobs.2Fexport
Hi @harsmarvania57
Can you please help me how to implement it in the above code?
I am new to this one any help would be much appreciated.
I'll prefer to do this using Splunk Python SDK, have a look at https://docs.splunk.com/Documentation/Splunk/7.2.6/Search/ExportdatausingSDKs#Use_Python_SDK_to_expo...
@harsmarvania57 Thanks for sharing the link.I was thinking to add the loop in my mentioned code to take count as 50000 and offset as 0 then count as 50000 and offset as 50000 and so on....I am not sure how to add this loop in my code.Can you please help me with that?
You'll not able to achieve this using loop because results
endpoint return only 50000 events. If you want to achieve this using export
endpoint with Splunk Python SDK then let me know and I'll provide script.
Hi @harsmarvania57 sure please share the script that would be great
Try below query (change query , time range , IP based on your requirement) and you need to download Splunk Python SDK to run this script
import sys
import getpass
import json
sys.path.append('splunk-sdk-python-1.6.4')
import splunklib.client as client
import splunklib.results as results
splunkUser = raw_input("Enter Splunk Username: ")
splunkPassword = getpass.getpass("Enter Splunk Password: ")
splunkService = client.connect(host='<IP>', port=8089, username=splunkUser, password=splunkPassword, verify=0)
kwargs_export = {"earliest_time": "-15m", "latest_time": "now", "search_mode": "normal"}
job = splunkService.jobs.export("search index=_internal | stats count by host,sourcetype", **kwargs_export)
rr = results.ResultsReader(job)
f = open('results.txt', 'w')
for result in rr:
if isinstance(result, dict):
a = json.dumps(dict(result))
f.write(a)
assert rr.is_preview == False
f.close()
Take a look at this answers post:
https://answers.splunk.com/answers/242114/limited-results-when-running-searches-via-rest-api.html
@kmorris_splunk Yes,I tried but its not working
Since applying that change, have you restarted the Splunk instance?