Getting Data In

Selective indexing and forwarding based on source

Genti
Splunk Employee
Splunk Employee

Use case:
I have three indexers A, B and C. Indexer A is monitoring 10 sources. I would like to index 5 of these sources on indexer A. The other 5 i would like to monitor 2 of them on indexer B and the rest (last 3) on indexer C. On the same time, i do NOT want these last 5 sources to show up on indexer A.

I am told that one way to do this is by installing a second instance of Splunk and use one instance for indexing and the other for forwarding.
This is not feasible, is there no other way to achieve this?

1 Solution

Genti
Splunk Employee
Splunk Employee

So,
there are a lot of cases and questions that come in regarding selective indexing and selective forwarding. So far Selective indexing has not really been possible. Our awesome engineers however have created a way for us to allow selective indexing by means of a special key that they have added to inputs.conf file. This has yet to be documented, but from my testing it seems to be working perfectly.

As mentioned in the question, there are a lot of use cases where one would like to index only some data, but forward other data. At the same time, there are plenty of cases where one wants to forward some data to one place, and some to another. The following example should accomplish both, and i hope this will answer a lot of questions out there.

The following is the simplest and shortest way i was able to come up with and should have no impact on performance (other methods required props/transforms and regexes to be used, which might cause some type of overhead).

The example uses the _TCP_ROUTING key that was introduced and documented during the 4.1 release as well as two new keys that as i mentioned have yet to be documented but appear to work without flaws..

Let us assume i have three sources located at /mydata/ called source1.log source2.log and source3.log. Let us also assume that we want to index source1.log on indexer A (the files are local to indexerA), send source2.log to indexer B and send source3.log to indexer C.

Then the following configuration achieves this..

  • in inputs.conf create the following stanzas:

    [monitor:///mydata/source1.log]
    _INDEX_AND_FORWARD_ROUTING=local

    [monitor:///mydata/source2.log]
    _TCP_ROUTING=indexerB_9997

    [monitor:///mydata/sourcer.log]
    _TCP_ROUTING=indexerC_9997

  • in outputs.conf have the following:

    [tcpout]
    defaultGroup=noforward
    disabled=false

    [indexAndForward]
    index=true
    selectiveIndexing=true

    [tcpout:indexerB_9997]
    server = indexerB:9997

    [tcpout:indexerB_9997]
    server = indexerC:9997

And that's it!
Explanation and Notes:
1. The key's that make this work are _INDEX_AND_FORWARD_ROUTING and selectiveIndexing=true
2. _INDEX_AND_FORWARD_ROUTING is set to local. Local can be any thing, all that is required is that there is a value for _INDEX_AND_FORWARD_ROUTING, you can set it to TRUE, if you like, however that might make it look like it actually takes a Boolean flag, which it doesnt.
3. The ONLY sources that WILL be indexed on indexer A will be the sources that have _INDEX_AND_FORWARD_ROUTING set in their inputs stanzas. If there is a monitor stanza that does NOT have _INDEX_AND_FORWARD_ROUTING flag, then this will NOT be indexed. This point is very important, because for example, the default /var/log/splunk directory in the etc/system/default/inputs.conf does not have this stanza on. As such, these logs will NOT be indexed, on indexer A.
4. The default stanza [tcpout] that goes nowhere is required, otherwise all data will be forwarded, instead of it being sent by source.

I hope this helps folks out there that have been wanting to do selective indexing for a while now.
Cheers,
.gz

View solution in original post

Genti
Splunk Employee
Splunk Employee

So,
there are a lot of cases and questions that come in regarding selective indexing and selective forwarding. So far Selective indexing has not really been possible. Our awesome engineers however have created a way for us to allow selective indexing by means of a special key that they have added to inputs.conf file. This has yet to be documented, but from my testing it seems to be working perfectly.

As mentioned in the question, there are a lot of use cases where one would like to index only some data, but forward other data. At the same time, there are plenty of cases where one wants to forward some data to one place, and some to another. The following example should accomplish both, and i hope this will answer a lot of questions out there.

The following is the simplest and shortest way i was able to come up with and should have no impact on performance (other methods required props/transforms and regexes to be used, which might cause some type of overhead).

The example uses the _TCP_ROUTING key that was introduced and documented during the 4.1 release as well as two new keys that as i mentioned have yet to be documented but appear to work without flaws..

Let us assume i have three sources located at /mydata/ called source1.log source2.log and source3.log. Let us also assume that we want to index source1.log on indexer A (the files are local to indexerA), send source2.log to indexer B and send source3.log to indexer C.

Then the following configuration achieves this..

  • in inputs.conf create the following stanzas:

    [monitor:///mydata/source1.log]
    _INDEX_AND_FORWARD_ROUTING=local

    [monitor:///mydata/source2.log]
    _TCP_ROUTING=indexerB_9997

    [monitor:///mydata/sourcer.log]
    _TCP_ROUTING=indexerC_9997

  • in outputs.conf have the following:

    [tcpout]
    defaultGroup=noforward
    disabled=false

    [indexAndForward]
    index=true
    selectiveIndexing=true

    [tcpout:indexerB_9997]
    server = indexerB:9997

    [tcpout:indexerB_9997]
    server = indexerC:9997

And that's it!
Explanation and Notes:
1. The key's that make this work are _INDEX_AND_FORWARD_ROUTING and selectiveIndexing=true
2. _INDEX_AND_FORWARD_ROUTING is set to local. Local can be any thing, all that is required is that there is a value for _INDEX_AND_FORWARD_ROUTING, you can set it to TRUE, if you like, however that might make it look like it actually takes a Boolean flag, which it doesnt.
3. The ONLY sources that WILL be indexed on indexer A will be the sources that have _INDEX_AND_FORWARD_ROUTING set in their inputs stanzas. If there is a monitor stanza that does NOT have _INDEX_AND_FORWARD_ROUTING flag, then this will NOT be indexed. This point is very important, because for example, the default /var/log/splunk directory in the etc/system/default/inputs.conf does not have this stanza on. As such, these logs will NOT be indexed, on indexer A.
4. The default stanza [tcpout] that goes nowhere is required, otherwise all data will be forwarded, instead of it being sent by source.

I hope this helps folks out there that have been wanting to do selective indexing for a while now.
Cheers,
.gz

gregbo
Communicator

I've added _INDEX_AND_FORWARDING=local to my inputs.conf for /var/log/splunk, and it works for splunkd.log, but i'm not getting anything in the _audit index. I can't find the stanza for that, so I don't know where to add _INDEX_AND_FORWARDING=local What stanza do I use?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Well, the fschange stanza in system appears to have some of that BUT there might be an index specific way to manage what you're up to. For example, if you peek at the outputs.conf you should see that you can selectively forward indexes. Furthermore, if you want folks to NOT see that data, simply use the Role to define it. If you are trying to not let it take up space then you can set the index for a brief retention.

While its more configuration, I personally find it more straightforward since those are some very common configuration items while _INDEX_AND_FORWARDING gets used less often (and therefore has less expertise around it).

0 Karma

sloshburch
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...