About nikos_d

nikos_d · ‎05-27-2015

How can I get the splunk SDK API to return results faster than 100 kB / second? Some context: I am trying to create queries for limited time range, which return more than 50000 rows. I have managed to do this using a blocking query and paginating the results for the reader, as described in the answer here: http://answers.splunk.com/answers/237043/how-to-submit-a-splunk-python-sdk-query-with-a-res.html (sorry but my reputation of 20 does not allow posting links). With the code posted in the link above, I can read the query results at a rate of around 100 kB / sec for queries of size 10 MB, which is not fast enough. To make reading the results faster I tried the solution in the accepted answer to this question: http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html which did not help. What did help was increasing the paginate counter to 50000, which implies that the underlying problem is what @ineeman suggested in his answer to the question (http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html, look for the answer on Mar 10, 2014 at 05:01 PM). To cut a long story short: using larger and larger offsets makes reading the results slow, as they have to be zipped and unzipped on disk. The correct solution to the problem seems to be using the export endpoint of the API, and read the results as they stream to you. I'd love any hints onhow to make my query reader faster. Can I make the paginated blocking query read results faster? If not, can I use the export endpoint to run a query with limited time range and more than 50000 results? (a link to examples of the export endpoint in use would be most helpful)

nikos_d · ‎05-26-2015

Adapting from this solution: http://answers.splunk.com/answers/124848/python-sdk-paginate-result-set.html#answer-227017 (thanks @paramagurukarthikeyan for the pointer and the answer), the following seems to work: import sys import io import splunklib.results as results import splunklib.client as client service = client.connect(host=HOST,port=PORT,username=USERNAME,password=PASSWORD) job = service.jobs.create(search, **{"exec_mode": "blocking", "earliest_time": start_time, "latest_time": end_time, "output_mode": "xml", "maxEvents": 30000000}) resultCount = int(job["resultCount"]) offset = 0; # Start at result 0 count = 50000; # Get sets of count results at a time thru_counter = 0 while (offset < resultCount): kwargs_paginate = {"count": count, "offset": offset} # Get the search results and display them rs = job.results(**kwargs_paginate) reader = results.ResultsReader(io.BufferedReader(rs)) wrt = sys.stdout.write for ix, item in enumerate(reader): if not (thru_counter % 50000): # print only one in 50000 results as sanity check line = "" for val in item.itervalues(): line += val + "," wrt(line[:-1] + "\n") thru_counter += 1 # Increase the offset to get the next set of results offset += count There is a remaining issue, that the parsing is relatively slow (I am getting ~1300 rows/sec, where each row is 100 bytes, i.e. ~130 kbps). The reason is hinted at in the answer of @ineeman on March 10 2014 in this question http://answers.splunk.com/answers/114045/python-sdk-results-resultsreader-extremely-slow.html I am posting a separate question to see if I can improve the speed of fetching the query results.

nikos_d · ‎05-26-2015

I second @mathu's request. Are there any examples on using export ? Using the "buffered" solution in the accepted answer above, only gives me extremely slow read speeds (reading rate of rows/sec becomes slower the longer the query is -as expected based on the above explanation)

nikos_d · ‎05-21-2015

This is the link which did not show up above due to my low number of points: http://answers.splunk.com/answers/39243/python-sdk-results-limited-to-50-000.html

nikos_d · ‎05-21-2015

I am trying to submit a query which is limited to a restricted time window AND returns more than 50000 rows in Python. I saw an answer on exceeding the 50000 row limit here but I cannot figure out how to add a custom time range to the query. The only way I know how to submit a limited time-range query is via the one_shot query of the Python SDK: import splunklib.client as client import splunklib.results as results service = client.connect(host=HOST, port=PORT, username=USERNAME, password=PASSWORD) kwargs_oneshot = {"earliest_time": earliest_time, "latest_time": latest_time, "output_mode": "xml", "count": 0} searchquery_oneshot = basequery oneshotsearch_results = service.jobs.oneshot(searchquery_oneshot, **kwargs_oneshot) reader = results.ResultsReader(oneshotsearch_results) for ix, item in enumerate(reader): for val in item.itervalues(): print(val) However,querying this way limits my results to 50000 rows. Any workarounds?

nikos_d · ‎05-05-2015

I am having exactly the same problem using the REST API in Python. Any help will be greatly appreciated

nikos_d · ‎05-05-2015

I have exactly the same question. How can I pass these parameters to the API? What is the syntax? Where is the documentation? Thanks!

nikos_d · ‎05-05-2015

@sideview: could you provide a link with the documentation for the time range arguments for the REST API? I cannot find it (says I, knowing that the probability of a reply 5 years after the post is near zero)

Posts	8
Solutions	1
Karma Given	3
Karma Received	13
Member Since	‎05-05-2015

Online Status	Offline
Date Last Visited	‎06-05-2020 02:03 AM

How can I get the Splunk Python SDK API to return ...

How to submit a Splunk Python SDK query with a res...

How can I get the Splunk Python SDK API to return ...

Re: How to submit a Splunk Python SDK query with a...

Re: Python SDK - results.ResultsReader extremely s...

Re: How to submit a Splunk Python SDK query with a...

How to submit a Splunk Python SDK query with a res...

Re: How to query Splunk API to only search for dat...

Re: subsearch default time range

Re: Limit Search by timeframe