Errors on OPSEC LEA Forwarder

sha1020 · ‎03-10-2016

Hi,

I have a heavy forwarder running the OPSEC LEA Add-on (version 3.1) and collecting logs from a Provider-1 with about 100 CMAs.

Load is rather high on the forwarder (~ 10-18) and In splunkd.log on the forwarder, there are a lot of messages like:

03-10-2016 12:05:42.812 +0100 WARN  HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/nobody/Splunk_TA_opseclea_linux22/configs/conf-opsec-entity-health/clm_xxxx: Broken pipe
03-10-2016 12:05:43.982 +0100 WARN  ConfMetrics - single_action=ACQUIRE_MUTEX took wallclock_ms=103963
[...]
03-10-2016 14:10:38.100 +0100 WARN  ConfMetrics - single_action=ACQUIRE_MUTEX took wallclock_ms=139931
03-10-2016 14:10:38.865 +0100 WARN  ConfMetrics - single_action=ACQUIRE_MUTEX took wallclock_ms=140866
03-10-2016 14:10:39.624 +0100 WARN  ConfMetrics - single_action=ACQUIRE_MUTEX took wallclock_ms=141386
03-10-2016 14:10:40.389 +0100 WARN  ConfMetrics - single_action=ACQUIRE_MUTEX took wallclock_ms=137119

These logs are repeating every second.

Can someone tell me what these warnings mean and whether they can be turned off?

Thanks a lot.

ryandg · ‎03-11-2016

 03-10-2016 12:05:42.812 +0100 WARN  HttpListener - Socket error from 127.0.0.1 while accessing /servicesNS/nobody/Splunk_TA_opseclea_linux22/configs/conf-opsec-entity-health/clm_xxxx: Broken pipe

This indicates that you are maxing out your threads on the server.conf:

 maxThreads = <int>
     * Number of threads that can be used by active HTTP transactions.
       This can be limited to constrain resource usage.
     * If set to 0 (the default) a limit will be automatically picked
       based on estimated server capacity.
     * If set to a negative number, no limit will be enforced.
 maxSockets = <int>
     * Number of simultaneous HTTP connections that we'll accept simultaneously.
       This can be limited to constrain resource usage.
     * If set to 0 (the default) a limit will be automatically picked
       based on estimated server capacity.
     * If set to a negative number, no limit will be enforced.

The other error is indicative that a bundle being pushed to the server is taking longer than Splunk's preferred threshold.

Honestly, with 100 CMAs.. you should NOT have it all on one dedicated HF -- unless each has barely any activity in which case why do you even have 100 CMAs? In my current environment we had to load balance 14 CMAs across 3 HFs dedicated purely to Opsec, otherwise we lose massive amounts of packets and have performance issues.

Errors on OPSEC LEA Forwarder

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!