Getting Data In

Why am I getting high CPU and high memory on universal forwarder even though we have very little data coming into Splunk?

robertlynch2020
Motivator

Hi,

We are using a forwarder (7.1.6) and we are seeing high CPU and high memory for Splunk forwarder (One whole core of a 20 core box).

alt text

However we are only getting in a trickle of data, so it's not like we are getting in millions of log files!

alt text

Is there anything I can do, to see what is happening inside it.

This is a tail of the log

You have new mail in /var/spool/mail/autoengine
dell479srv autoengine /dell479srv2/apps/splunkforwarder_MxOne_Testing_Latest/var/log/
bash$ tail -f splunk/splunkd.log
02-19-2019 15:30:02.144 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:30:02.144 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:35:03.296 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:35:03.296 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:40:02.983 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:40:02.983 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:45:03.007 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:45:03.008 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:50:03.320 +0100 INFO  WatchedFile - File too small to check seekcrc, probably truncated.  Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
02-19-2019 15:50:03.320 +0100 INFO  WatchedFile - Will begin reading at offset=0 for file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.
1 Solution

paranjith
Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

View solution in original post

0 Karma

paranjith
Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

0 Karma

robertlynch2020
Motivator

Yes this file keeps getting re-written (in parts) , so splunk keeps having to re-read it over and over.

all other files append, this was the one that does not

thanks
rob

0 Karma

paranjith
Explorer

Can you please specify the way this log is generated? Looking at the log snippet provided, this looks to be an issue with the way log is updated, since splunk indexes the log file and every new entry in the log should only get indexed. But, in this case, looking at the log snippet, looks like each updated entry is updating/refreshing the whole log file itself, making splunk to consider this as a new file to be indexed again since the crc value has changed:

02-19-2019 15:30:02.144 +0100 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/qcst_out_toolsMonitoring_CheckToolLifeCycle.txt'.

robertlynch2020
Motivator

Hi

This was the answer, can you post it and i will accept it please.
The file we deleting parts of it self and Splunk had to keep taking it in again and again.

Rob

0 Karma

paranjith
Explorer

Posted it 🙂

0 Karma

markusspitzli
Communicator

We had a similar issue but we were ingesting over a million files on a uf. The issues was that the UF had to monitor to many files. when we switched to the batch:// input it worked just fine.
I assume you have similar issues because the /net folder is designed to contain nfs shared directories from remote hosts.

0 Karma

robertlynch2020
Motivator

Hi

I cant use batch mode as other services needed the files after Splunk has read them in (Batch mode will delete the files right?).
We also have a lot of files and this could be causing the issue!!!

The forwarder is installed on machine dell479srv - perhaps we don't need to to use /net/ perhaps this is causing an issue.
[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jmap*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*.log$
sourcetype = jmap
crcSalt =
blacklist=logs_|fixing_|tps-archives

0 Karma

markusspitzli
Communicator

Yes, batchmode will delete the files.
In our case the application engineer had to copy the files to a dedicated directory, where we were able to use batch mode.

If possible i wouldn't use the /net and install the UF on each server you want to ingest data.

0 Karma

robertlynch2020
Motivator

Hi

We removed the /net and it reduced by 30% also we removed some unwanted file we were monitoring as well.
We might have to move to a dedicated machine, this is a bit annoying as people often ask me what is the impact of Splunk and the "Nice answer" is very small, but in this case its hight...hmmm

Cheers for the help
Rob

0 Karma

markusspitzli
Communicator

The Splunk UF doesnt use much resources... usually. But when having a lot of files thats not the case.

You're welcome

0 Karma

markusspitzli
Communicator

what's your configuration on the UF? like inputs.conf and props.conf if applicable?

0 Karma

robertlynch2020
Motivator

linputs.conf
[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs.../*.json]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = reset_profiler
crcSalt =
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs.../*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = sun_jvm
crcSalt = <SOURCE>
whitelist = .*gc\.log$|.*gc.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*.tps]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = tps
crcSalt = <SOURCE>
whitelist = .*\.tps$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/vmstat/*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = vmstat-linux
crcSalt = <SOURCE>
whitelist = vmstat.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/nicstat/*]
disabled = false
host = MxOne_Testing_Latest 
index = mlc_live
sourcetype = nicstat
crcSalt = <SOURCE>
whitelist = nicstat.*\.log$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = mx_version
crcSalt = <SOURCE>
whitelist = mx_version_.*$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring/mlc_version/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = mlc-version
crcSalt = <SOURCE>
whitelist = mlc_version_.*$
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = murex_log4j
whitelist = .*\.log$
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives|errors.log|.*gc\.log$|.*gc.*\.log$

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=mxtiming.*\.log$
blacklist=logs_|fixing_|tps-archives|mxtiming_crv_nr.*
crcSalt = <SOURCE>
sourcetype = MX_TIMING2

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT.../*service.log]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist = (?<NPID>\d*)-\d*-service\.log
blacklist=logs_|fixing_|tps-archives
crcSalt = <SOURCE>
sourcetype = service

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/qcstTools/qcstOutFiles/*_CheckToolLifeCycle.txt]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
sourcetype = tool_lifecycle
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jmap*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*\.log$
sourcetype = jmap
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

[monitor:///net/dell479srv/dell479srv2/apps/TheOne-RSAT/logs/monitoring.../jstack*]
disabled = false
host = MxOne_Testing_Latest
index = mlc_live
whitelist=.*\.log$
sourcetype = jstack
crcSalt = <SOURCE>
blacklist=logs_|fixing_|tps-archives

props.conf
[splunkd]
EXTRACT-fields = (?i)^(?:[^ ]* ){2}(?:[+-]\d+ )?(?P[^ ]*)\s+(?P[^ ]+) - (?P.+)

[splunk_web_service]
EXTRACT-useragent = userAgent=(?P[^ (]+)

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...