Monitoring Splunk

How to fix failed indexer on Splunkd?

pacifikn
Communicator

Hi all,

One Splunkd indexer is failing while other indexers are running.
I'm also getting a TCPOutAutoLB-0 error.

How can I fix these issues?

Thank you in Advance.

Labels (2)
0 Karma

PavelP
Motivator

Hello @pacifikn,

as @skoelpin suggested, check splunkd.log and other logs, particularly crash*log, on the failed indexer. You will not find anything from the indexer on the MC if the indexer is down because it cannot send any logs to the MC.

  1. check last lines in $SPLUNK_HOME/var/log/splunk/splunkd.conf, especially with ERROR and WARN severity
  2. check if there are any crash*log in $SPLUNK_HOME/var/log/splunk/ folder
  3. run systemctl status Splunkd if it is a systemd-enabled splunk
  4. run grep -i splunk /var/log/messages

Let me know if you find something

0 Karma

pacifikn
Communicator

@Dear PaveIP,
@skoelpin ,

Dear PaveIP ,i have run those command,

1&2 command:

I have choose the splunkd.log.5 which is last one on the splunkd log but not last file in running the command, And by looking on WARN and INFO gives me this below output:

04-23-2020 20:51:17.667 +0200 INFO TcpOutputProc - Connected to idx=host_Ip:9997 ,pset=0 , reuse=0.

host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd

04-23-2020 21:05:44.329 +0200 WARN TcpOutputFd - Connect to host_Ip:9997 failed . Connection refused

host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd

04-09-2020 07:26:01.921 +0200 IWARN LookupDataProvider - The Value fro timeformat '' is invalid.

04-11-2020 03:13:55.944 +0200 INFO TailReader -Batch input finished reading file='/opt/splunk/var/spool/splunk/1586567405_3259.stash_common_action_model' etc...
NB:
-here the problem is i don't know exactly what unknown error should i find to check ,here i find so many log information which i don't well understood,is there any known log error you know i could check on this ??what i was find i mentioned above seeing WARN and INFO,


3-command:systemctl status Splunkd if it is a systemd-enabled splunk

running this ,even if splunkd is not running (./splunkd status) but using this command(systemctl ....) is showing me the below information:
splunkd.service -Splunk service
Loaded: loaded (/etc/systemd/system/splunkd.service;enabled;vendor preset: disabled)
Active: active (running) since Sat 2020-04-18 02:14:21 CAT; 5 days ago
process: 73xxx ExecStartPost=/bin/bash -c chown -R ....etc

  1. run grep -i splunk /var/log/messages

Apr 23 20:10:01 splunksh systemd: Started Session 1065 of user root.
Apr 23 19:50:01 Splunksh systemd: Removed Slice User Slice of root.
Apr 23 20:37:35 Splunksh systemd-logind: New Session 1071of user root.
etc.... but the same as above

May you identify the error on the above information? for me to be honest i don't well understood on how to fetch error/investigate this info and find error and fix it????
I need help??

0 Karma

PavelP
Motivator

Post the output of
ps aux | grep -i splunk

It seems splunk is running

0 Karma

pacifikn
Communicator

Dear PaveIP,
the is the output of the command is:

ps aux |grep -i splunk

splunk 103.. 0.4 0.1 3537.. 1047.. ? Ssl Apr17 39:16 splunkd --under-systemd --systemd-delegate=no -p 8189 _internal_launch_under_systemd
splunk 107.. 0.0 0.0 0 0 ? Z Apr17 0:00 [systemctl]
splunk 108.. 0.0 0.0 814.. 95.. ? Ss Apr17 0:35 [splunkd pid=103..] splunkd --under-systemd --systemd-delegate=no -p 8189 _internal_launch_under_systemd [process-runner]
root 1492.. 0.0 0.0 1127.. 996 pts/1 S+ 06:21 0:00 grep --color=auto -i splunk

0 Karma

PavelP
Motivator

yes, Splunk is running, I'd expect more processes. Can you post it again, with less editing using "code sample" button? And again the output of "systemctl status Splunkd". And don't remove important parts, it is all hidden behind "..."

0 Karma

skoelpin
SplunkTrust
SplunkTrust

You're going to want to search to internal logs and the MC to identify why it stopped before doing anything else

pacifikn
Communicator

Hello dear Skoelpin,

in MC it showing me that 1 instances unreachable and it is that one indexer that are down.

and when am checking if splunk process is running using ./splunk status it is showing me that splunkd is not running ,how to make it run again using CLI ?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

I'd strongly recommend identifying why it stopped before starting it. While in the MC, go to Instances, then under "Action", select "Views" and checkout the performance and resource usage. You should then look in the internal index and identify any error messages it may have thrown before stopping. You can do this with a query like this

index=_internal sourcetype=splunkd host=<YOUR INDEXER>

Look for any log levels that are not INFO and any messages along with it. After you've determine root cause and you still want to start it, ssh to the indexer and start it

This assumes your splunk instance is under /opt

/opt/splunk/bin/splunk start
0 Karma

pacifikn
Communicator

Dear Skoelpin, thank you for your guidance,

I have checked into the intrnal logs, i found out the below output logs which it seems abnormal,

under Event


04-23-2020 20:51:17.667 +0200 INFO TcpOutputProc - Connected to idx=host_Ip:9997 ,pset=0 , reuse=0.
host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd


04-23-2020 20:51:17.667 +0200 INFO TcpOutputProc - Closing Stream for idx=host_Ip:9997
host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd


I see the above logs, how may i fix this?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Those are normal, keep looking. Perhaps filter down your query by including log_level!="INFO"

0 Karma

pacifikn
Communicator

Dear Skoelpin, addding log_level!="INFO" in search i got this:


04-23-2020 21:05:44.329 +0200 ERROR TcpOutputFd - Connection to host_Ip:9997 failed
host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd


04-23-2020 21:05:44.329 +0200 WARN TcpOutputFd - Connect to host_Ip:9997 failed . Connection refused
host=host_name source=/opt/splunkforwarder/var/log/splunk/splunkd.log sourcetype=splunkd


running the query i got this above logs and others but it's the same only hours are changed but same error.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...