Hi,
I've set up a Unix universal forwarder to monitor text-based files on a system.
I always thought forwarders have a small footprint, but my forwarder currently eats up 17% of the CPU of the machine it's installed on.
I checked everything and found something weird.
Splunkd_access.log writes approx. 2 MB of data every second. Splunkd_access.log rolls about every two minutes.
Splunk-Forwarder-Version: 6.4.1
Splunkd_access.log shows the following constant output:
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
-somedate- "POST /services/shcluster/member/consensus/pseudoid/raft_request_vote?output_mode=json HTTP/1.1" 401 71 - - - 0ms
While splunkd.log throws me this repeatedly:
-somedate- INFO WatchedFile - Checksum for seekptr didn't match, will re-read entire file='/opt/splunkforwarder/var/log/splunk/splunkd_access.log'.
-somedate- INFO WatchedFile - Will begin reading at offset=0 for file='/opt/splunkforwarder/var/log/splunk/splunkd_access.log'.
Anyone here who has seen this strange behavior before?
Thanks in advance!
Best regards,
pyro_wood
The splunkd.log part is benign, just a sign of log rotation happening in splunkd_access.log.
The access logs suggest someone is trying to make the forwarder vote in a search head cluster captain election.
That makes no sense whatsoever, make sure no SHC is configured with this machine as a member on top of what @muebel said.
The client IP listed in the access log should be a good clue as to where to look for misconfiguration first.
The splunkd.log part is benign, just a sign of log rotation happening in splunkd_access.log.
The access logs suggest someone is trying to make the forwarder vote in a search head cluster captain election.
That makes no sense whatsoever, make sure no SHC is configured with this machine as a member on top of what @muebel said.
The client IP listed in the access log should be a good clue as to where to look for misconfiguration first.
Make sure no other cluster members remember this forwarder as a former member on top of cleaning up the forwarder itself.
Thank you muebel and martin_mueller for your suggestions.
Martin is indeed right with his assumption. This machine was previously configured as part of a search head cluster. But splunk had been deleted since.
Anyway... I now ordered a complete wipe of the machine and a reset and then it should be all fine again.
Thanks to you two for the quick responses 🙂
Why CPU spikes and stays at 100% installing universal forwarder on server 2012 R2?
A good one at - High cpu usage on splunk forwarder
About Checksum for seekptr didn't match, will re-read entire file='access.log'
-
Explanation of Checksum for seekptr didn't match, will re-read entire file
Thanks for nothing. Links don't help either. Better read the question first next time!
Very weird... I'd scrutinize for any shcluster related config
/opt/splunkforwarder/bin/splunk btool --debug server list | less
Search for any mention of shcluster
Other than that you might have a bug on your hands.
Do you have multiple systems expressing this behavior?