Knowledge Management

Splunk app and best practices for indexing

mlstom
New Member

I am developing a Splunk app and just wanted to hear for someone what is considered to be the best practice when it comes sending events to Splunk, to be processed and indexed.

Basically, I am concerned that sending events into Splunk as soon as they are available would take a toll on the indexer because there will be constant flow of data every few seconds, but on the other hand, waiting for all data to come in and then index, is not an option because events could be coming in for days, and I can't wait so long to see the data in the system. So, my best guess is to set a cap on the number of events that would be indexed at a time, so for example, I would wait for 10000 events to accumulate and then send them into Splunk for processing. Could someone offer advice on this ?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

You're overthinking it. Splunk indexers are specifically designed to handle the constant stream of data coming in. In fact, if for some reason Splunk slows down, the downstream forwarders will just queue up.

How are you sending the data to Splunk? I ask because in normal usage of Splunk you should never have to worry about this topic.

If I remember correctly, the indexing process will use about one core of the CPU so the other cores are available for returning search results of that data. If that load is insufficient, increase the indexing pipelines (more cores used) and increase the indexers (better data distribution).

Also, won't the users be misled if they try to run reports on the data and don't realize that it's incomplete or not currently sending?

I'm not sure if you can tell, but I'm very concerned by the question. I am confident that any means of modulating the data flow will provide a terrible experience with the Splunk platform.

Respond back with more info and I'm happy to answer other concerns about this.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I don't know about best practices in this area, but IMO, data should be indexed as soon as it's available. Splunk can't act on data it doesn't have.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...