All Apps and Add-ons

OPSEC LEA R80 logging behind

mmoermans
Path Finder

Ever since the upgrade to R80 the logs from OPSEC LEA app have been behind by about an hour (ranging from 30m to 90m through out the day), what can be the cause of this? Before they were always perfectly indexed within seconds.

The opseclea:log:modinput logs don't show any errors so it's hard to pinpoint the issue.

Action01
Loves-to-Learn

Hi,

We had the same problem, but we had latencies ranging from minutes up to 15 hours, depending on the traffic load on the checkpoint (this was pre-upgrade on R77.30). It seemed that at more than 41000 events per minute, we experienced a build-up of latency. Performance and resource usage of the HF, the indexers or the checkpoint management server were just fine, no obvious culprit and no errors or warnings in opseclea:log:modinput.

We have NMON running on the HF, so inspecting the CPU and memory usage was easy. Splunkd didn't do much (on average 0.35), and lea_lograbber was close to nothing (0.03). Python however used 1.6-1.7 CPU cores continuously. This was the maximum that i observed, which always happened at more than 41000 events per minute. Below the 41000 epm the process used less resources.

After the upgrade to R80.10 it got even worse with latency running up to 22 hours and climbing.

(I think) I managed to solve this by just setting the log level to INFO, instead of DEBUG (which I assumed was necessary for "debugging" this problem...). The debugging resulted in half a million events per minute of _internal debug logging...

After changing this level, and setting the starttime on each input to a time a couple hours before (thus skipping most of the 22 hours of latency), the CPU usage of python was only 0.7 CPU core.... And fw1_loggrabber and splunkd spiked to levels not seen before (both at 2 CPU cores each). Around the same moment the Metrics log reports that it indexed 1.3 million events/minute for a couple of minutes. It seems that the DEBUG log level (very) negatively impacted the maximum events that python/lea_loggrabber could retrieve/send to splunk.

Some time later the resource usage came down; splunkd and lea_loggrabber run both at 0.1 core, python at 0.03... That is at around 70000 events per minute.

I'll be monitoring closely what happens under more load, but for now it seems all right.

Action01

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

the logs also have many new fields so the size of a event is about 3x larger plus you can not deselect the fields since they are not exposed in the inputs definition app plsu i fail to blacklist the superflous fields. 😕

0 Karma

hatalla
Path Finder

Hey mmoermans,

Did you figure a solution for this time gap between _time and indextime? We are having the same issue where the events _time can span anywhere from few minutes to up to 3 hours in comparison to indextime. I reduced the polling interval to 300 secs/5 minutes and no luck; still seeing the time gap. We are also using R80

Thanks.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...