Splunk Search

Prevent splunk from streaming results to a custom search command?

mute_dammit
Engager

I've created a custom command in python that needs to view an entire set of events as a single batch, because it's comparing subsequent events. Unfortunately, Splunk is sending events to the custom command in chunks of <= 50,000 events. The commands.conf has streaming = false. Setting run_in_preview = false only changes the way the results are displayed, as expected.

In case it's relevant, the command is running on a search head which receives events from several distributed search nodes.

Here's the basic code -- run() is invoked by a minimal plugin "manager":

class RemoteLogins( SplunkPlug ):
  def run( self, events, keywords, options ):
    out_events = []
    if not events:
      intersplunk.outputResults( out_events )
      return
    now = datetime.now()
    with open( "/opt/splunk/var/log/test.log", "a" ) as f:
      f.write( "Running at %s with %s events\n" % ( now, len( events ) ) )
    for related_events in self.related( events ):
      self.find_overlap( related_events, out_events )
    with open( "/opt/splunk/var/log/test.log", "a" ) as f:
      f.write( "Ending %s with %s results\n" % ( now, len( out_events ) ) )
    intersplunk.outputResults( out_events )

When invoked by a single splunk search, these results are generated:

Running at 2011-08-27 16:56:18.619245 with 25 events
Ending 2011-08-27 16:56:18.619245 with 0 results
Running at 2011-08-27 16:56:19.078111 with 2942 events
Ending 2011-08-27 16:56:19.078111 with 0 results
Running at 2011-08-27 16:56:20.900458 with 19980 events
Ending 2011-08-27 16:56:20.900458 with 1 results
Running at 2011-08-27 16:56:31.590848 with 50000 events
Ending 2011-08-27 16:56:31.590848 with 4 results
Running at 2011-08-27 16:56:55.376255 with 50000 events
Ending 2011-08-27 16:56:55.376255 with 3 results

Once the search is complete, only the 3 results from the last batch of events is shown.

For completeness, here's commands.conf:

[py]
type = python
filename = py.py
streaming = false
run_in_preview = false
maxinputs = 0

So, is there any way aside from the settings in commands.conf to really convince Splunk not to stream events into a custom command? Maybe an intermediate command I could insert into the pipeline?

Tags (1)
1 Solution

sideview
SplunkTrust
SplunkTrust

Well, either someone else can spot what's missing or can confirm that it's a bug, but for the time being an easy way to make sure no streaming events make it to your command is just to put a non-streaming command in front of it.

`<your search> | table * | py`

should do it.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

That behavior doesn't seem right to me, but streaming=false was never intended to cause splunk to deliver all the events regardless of event quantity to the search command. To my understanding, it is supposed to influence how the search machinery thinks, and encourage it to only give one chunk to the search command.

Essentially, you could view this flag as "I'm only designed for small datasets".

In order to make your tool work over large datasets, you'll want to be streaming, and you'll want to be able to handle the data chunk by chunk.

For some problems that opens up an entire new topic about how you can efficiently store your state, and is it valid to emit nothing until the last call, and how do you know when it's the last call..

mridus
New Member

Hi, Any idea how we can determine which is the last call? So that we can collate all the results? so that we emit nothing until the last call?

0 Karma

sideview
SplunkTrust
SplunkTrust

Well, either someone else can spot what's missing or can confirm that it's a bug, but for the time being an easy way to make sure no streaming events make it to your command is just to put a non-streaming command in front of it.

`<your search> | table * | py`

should do it.

sideview
SplunkTrust
SplunkTrust

mute_dammit: yea once you're out of the streaming portion I'm afraid 50,000 is the default in limits.conf. It can be changed although it's to be filed under "do at your own risk"...

0 Karma

gwobben
Communicator

I know this is old, but is there any update on this?

0 Karma

jrodman
Splunk Employee
Splunk Employee

If you're writing custom search commands, the update is the python sdk offering significant support for doing so, which should enable you to work with the model without a lot of difficulty. The fundamental behavior of the interaction hasn't changed to my knowledge.

Splunk did some work on long-running python processes a few releases ago, but I don't think we "leveraged" it for search commands.

0 Karma

sideview
SplunkTrust
SplunkTrust

scanCount numbers below 5000 or 10000 can be pretty misleading. Splunk will pretty much always scan at least that deep into any search before potentially shutting down the stream, because that's the sort of "chunk" size that the search process uses when talking to the index. Or so I understand.

0 Karma

asingla
Communicator

It didn't work for me. I am using a dedup command in my search and the search scanned roughly 4000 events and the result set size is only 11 events.

0 Karma

mute_dammit
Engager

Adding the non-streaming command does keep splunk from sending multiple chunks of events to the custom script. Unfortunately, only the last 50k events are sent. Since I've asked for unlimited inputs, this is pretty sure to be a bug.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...