Re: search returns only 50000 events

bdhin · ‎06-03-2019

Hi,

We are using the below python script to get the results from Splunk but the problem is that through UI we are getting more than 6lakh records. However, through API we are getting only 50000 records.

Please help - what do I need to add in below script to get all the records?

import urllib
import httplib2
import time
import re
from time import localtime,strftime
from xml.dom import minidom
import json
baseurl = 'https://localhost:8089'
username = ''
password = ''
myhttp = httplib2.Http()

#Step 1: Get a session key
servercontent = myhttp.request(baseurl + '/services/auth/login', 'POST',
                            headers={}, body=urllib.urlencode({'username':username, 'password':password}))[1]
sessionkey = minidom.parseString(servercontent).getElementsByTagName('sessionKey')[0].childNodes[0].nodeValue
print "====>sessionkey:  %s  <====" % sessionkey 

#Step 2: Create a search job    
searchquery = 'index="_internal" | head 10'
if not searchquery.startswith('search'):
searchquery = 'search ' + searchquery

searchjob = myhttp.request(baseurl + '/services/search/jobs','POST',
headers={'Authorization': 'Splunk %s' % sessionkey},body=urllib.urlencode({'search': searchquery}))[1]
sid = minidom.parseString(searchjob).getElementsByTagName('sid')[0].childNodes[0].nodeValue
print "====>sid:  %s  <====" % sid

#Step 3: Get the search status    
myhttp.add_credentials(username, password)
servicessearchstatusstr = '/services/search/jobs/%s/' % sid
isnotdone = True
while isnotdone:
    searchstatus = myhttp.request(baseurl + servicessearchstatusstr, 'GET')[1]
    isdonestatus = re.compile('isDone">(0|1)')
    isdonestatus = isdonestatus.search(searchstatus).groups()[0]
    if (isdonestatus == '1'):
        isnotdone = False
print "====>search status:  %s  <====" % isdonestatus

#Step 4: Get the search results
services_search_results_str = '/services/search/jobs/%s/results?output_mode=json&count=0' % sid
searchresults = myhttp.request(baseurl + services_search_results_str, 'GET')[1]
print "====>search result:  [%s]  <====" % searchresults

codebuilder · ‎06-05-2019

You're hitting a default search limit. You can increase this value within limits.conf

[searchresults]
maxresultrows = 50000

And/or:

[restapi]
maxresultrows = 50000

You'll need to cycle Splunk after making the config change.

Generally speaking, when you see nice round numbers like 50000, then you're encountering a limitation/parameter within limits.conf

----
An upvote would be appreciated and Accept Solution if it helps!

bdhin · ‎06-05-2019

Hi @codebuilder

Thank you for sharing the answer,I was thinking to add the loop in my code to check the value for count and offset and based on that fetch the output.I am not sure how to implement that in my code.Can you please help me with that?

codebuilder · ‎06-05-2019

I think it would be easier and more reliable if you instead narrow your search. Either by excluding data or narrowing the date range. It will perform much faster, so you can iterate through a call to that search in order to retrieve all the results you are seeking. It will be more simple to read and maintain, and will perform much better.

----
An upvote would be appreciated and Accept Solution if it helps!

codebuilder · ‎06-05-2019

(forgot to mention)
Also consider using a accelerated datamodel, your scenario sounds like a perfect candidate.

----
An upvote would be appreciated and Accept Solution if it helps!

harsmarvania57 · ‎06-04-2019

Hi,

For large dataset export, please use jobs/export endpoint https://docs.splunk.com/Documentation/Splunk/7.2.6/RESTREF/RESTsearch#search.2Fjobs.2Fexport

bdhin · ‎06-04-2019

Hi @harsmarvania57

Can you please help me how to implement it in the above code?

I am new to this one any help would be much appreciated.

harsmarvania57 · ‎06-04-2019

I'll prefer to do this using Splunk Python SDK, have a look at https://docs.splunk.com/Documentation/Splunk/7.2.6/Search/ExportdatausingSDKs#Use_Python_SDK_to_expo...

bdhin · ‎06-05-2019

@harsmarvania57 Thanks for sharing the link.I was thinking to add the loop in my mentioned code to take count as 50000 and offset as 0 then count as 50000 and offset as 50000 and so on....I am not sure how to add this loop in my code.Can you please help me with that?

harsmarvania57 · ‎06-06-2019

You'll not able to achieve this using loop because results endpoint return only 50000 events. If you want to achieve this using export endpoint with Splunk Python SDK then let me know and I'll provide script.

bdhin · ‎06-06-2019

Hi @harsmarvania57 sure please share the script that would be great

harsmarvania57 · ‎06-06-2019

Try below query (change query , time range , IP based on your requirement) and you need to download Splunk Python SDK to run this script

import sys
import getpass
import json
sys.path.append('splunk-sdk-python-1.6.4')
import splunklib.client as client
import splunklib.results as results

splunkUser = raw_input("Enter Splunk Username: ")
splunkPassword = getpass.getpass("Enter Splunk Password: ")

splunkService = client.connect(host='<IP>', port=8089, username=splunkUser, password=splunkPassword, verify=0)
kwargs_export = {"earliest_time": "-15m", "latest_time": "now", "search_mode": "normal"}
job = splunkService.jobs.export("search index=_internal | stats count by host,sourcetype", **kwargs_export)

rr = results.ResultsReader(job)
f = open('results.txt', 'w')

for result in rr:
    if isinstance(result, dict):
        a = json.dumps(dict(result))
        f.write(a)
assert rr.is_preview == False
f.close()

kmorris_splunk · ‎06-03-2019

Take a look at this answers post:

https://answers.splunk.com/answers/242114/limited-results-when-running-searches-via-rest-api.html

bdhin · ‎06-03-2019

@kmorris_splunk Yes,I tried but its not working

martynoconnor · ‎06-03-2019

Since applying that change, have you restarted the Splunk instance?

Search returns only 50000 events in Python script?

python

scripted input

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk