Deployment Architecture

Splunk Forwarder performance (high CPU)

devosr
New Member

We are currently running an evaluation of Splunk. Our current environment exists out of one indexer and 105 Windows servers that have Splunk Forwarder installed. All Forwarders use the same configuration:

  • monitor one directory, with a whitelist option, non-recursive
  • one location in the Windows Event log.

All servers (that have a Forwarder installed) run Windows Server 2008R2. On most systems, the Forwarder uses little resources, while on others, there are constant spikes of 100% CPU.
It seems that only small servers are impacted by this. By small I mean one (virtual) CPU and not a lot of system activity. I have used Procmon to analyse what's going on and to compare the splunkd.exe process on a busy system (where it runs fine) and on a small system (where it uses lot's of CPU).
During the CPU spikes there are a lot of QueryDirectory actions seen on the systems that have these issue. The directory is the one that's in the monitor stanza. The action happens +- 150000 on troubled systems compared to +-300 for systems that run fine (roughly same monitor period.)

The configuration is the same, the forwarders were all installed the same way, using the command line. What could cause the Forwarder to query that directory so much and cause so much CPU?

0 Karma

bohanlon_splunk
Splunk Employee
Splunk Employee

You could try logging a support case and capturing a procdump as per:
https://answers.splunk.com/answers/5400/high-cpu-usage-on-splunk-forwarder.html

Also worth checking:
-Does your input stanza use a wildcard like * or ... (you said this was non-recursive)?
-Have you got AV on these systems (if so, what exclusions are in place)?
-What commonalities exist between the spiking systems versus the behaving ones?

0 Karma

devosr
New Member

The input stanza that seems to be causing the issue is the following one (I have "anonymized" the settings):

[monitor://D:\path\software name\logs]
disabled = false
index = index_name
sourcetype = sourcetype_name
whitelist = ^.*regex.*expression.*\.log$
recursive = false

So the monitor path itself does not contain ... or * but the whitelist option does.

I'm currently still waiting on the ant-virus team to whitelist all Splunk processes, but this does not seem to cause issues on the other systems.

The spiking systems and behaving ones all had the forwarder installed on the same day, in the same way. They also all use the same server class / deployed apps. The only difference is that the spiking systems have one CPU and have a lot less activity / generate a lot less logs.

I'll generate a procdump during a spike.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...