Getting Data In

Where should I run my report that populates a summary index?

markwymer
Path Finder

Hi all,

I've got a simple search and filter that gets piped into the collect command to create a Summary index. I've saved it as a report, scheduled it to run every hour on one of my, non-clustered search heads. The data is being extracted from an index that is on two, load balanced indexers.

Everything is fine and, for this particular search, subsequent search times have dramatically reduced.

My question is, would the scheduled report be better run from the two indexers and spread the summary index load over the two, or must it be stored on the search head?

If I can run the report on indexer1, will it pick up the indexed data from indexer2? Or can/should I run it on both?

Sorry, I just thought of a new question! If I run the report on a 'don't index and forward' Search head, will it automatically send the summary index data to the indexers?

Sorry for so many questions but your help, as always, is gratefully received.
Mark.

0 Karma
1 Solution

Jeremiah
Motivator

So, lets break down what is happening when you schedule and run a summary search. When you run the search, like any search in Splunk, your search is distributed to the search peers (indexers) configured on the search head. The search head then takes the results of the search and stores them in stash files that it puts into $SPLUNK_HOME/var/spool/splunk directory. By default, Splunk has setup a batch input for this directory. So when the files are dropped into the spool directory, Splunk indexes the files and deletes them. If the search head is configured to forward its data, then it will not index the files locally, but instead forward them to wherever its forwarding destination is. What that means is that the source of the summary search results is not connected to the summary indexing destination. Now in practice, most everyone has their search heads configured to send data to their indexers, the same indexers they have configured as search peers. That's a best practice.

So lets look at the scenarios you mentioned:

If you run the summary index on a search head, and that search head is configured to forward data to your indexers, then any summary data will be evenly distributed among the indexers. This is what you want to do. There isn't any need to think about "distributing" the searches to the indexers, or distributing the results across the indexers, the search head takes care of that for you.

If you run the summary index on a search head, and that search head is not configured to forward data, the summary results will be indexed locally on the search head. You might want to do this, but then you'll have to deal with storage on the search head, and as the summary result set increases in size, your search won't scale accordingly-- you'll lose the benefit of distributed search.

If you run the summary index on an indexer, the data will remain on that indexer. You don't want to do this, because you have multiple indexers, so your search results will be incomplete. In general, you don't want to execute any searches directly on your indexer. Let the search head distribute them.

View solution in original post

Jeremiah
Motivator

So, lets break down what is happening when you schedule and run a summary search. When you run the search, like any search in Splunk, your search is distributed to the search peers (indexers) configured on the search head. The search head then takes the results of the search and stores them in stash files that it puts into $SPLUNK_HOME/var/spool/splunk directory. By default, Splunk has setup a batch input for this directory. So when the files are dropped into the spool directory, Splunk indexes the files and deletes them. If the search head is configured to forward its data, then it will not index the files locally, but instead forward them to wherever its forwarding destination is. What that means is that the source of the summary search results is not connected to the summary indexing destination. Now in practice, most everyone has their search heads configured to send data to their indexers, the same indexers they have configured as search peers. That's a best practice.

So lets look at the scenarios you mentioned:

If you run the summary index on a search head, and that search head is configured to forward data to your indexers, then any summary data will be evenly distributed among the indexers. This is what you want to do. There isn't any need to think about "distributing" the searches to the indexers, or distributing the results across the indexers, the search head takes care of that for you.

If you run the summary index on a search head, and that search head is not configured to forward data, the summary results will be indexed locally on the search head. You might want to do this, but then you'll have to deal with storage on the search head, and as the summary result set increases in size, your search won't scale accordingly-- you'll lose the benefit of distributed search.

If you run the summary index on an indexer, the data will remain on that indexer. You don't want to do this, because you have multiple indexers, so your search results will be incomplete. In general, you don't want to execute any searches directly on your indexer. Let the search head distribute them.

markwymer
Path Finder

Thanks Jeramiah, that answers everything and gives some very useful background information too.

0 Karma

renjith_nair
Legend

We have almost the same set up and would suggest

  1. Run the searches on the search head (it's made for that)
  2. Forward the summary index to your load balanced indexers (indexes are supposed to be on indexers 🙂 )

Configuration for search head as a forwarder

# Turn off indexing on the search head
[indexAndForward]
index = false

[tcpout]
defaultGroup = my_search_peers 
forwardedindex.filter.disable = true  
indexAndForward = false 

[tcpout:my_search_peers]
server=10.10.10.1:9997,10.10.10.2:9997,10.10.10.3:9997
autoLB = true
Happy Splunking!
0 Karma

markwymer
Path Finder

Thanks Nair,

Definitely the answer that I was hoping for.

I don't have the infrastructure in my test environment to try this out, so I thought I would ask the question before diving straight into my live indexers/searcheads.

Brgds,
Mark.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...