Hello,
I have 3 indexers. After one of them was restarted then Master Node crash and create crash log every minutes (when indexer try connect to cluster)
Below crash log:
[build cd0848707637] 2022-03-29 17:48:34
Received fatal signal 6 (Aborted) on PID 3183981.
Cause:
Signal sent by PID 3183981 running under UID 1004.
Crashing thread: CMAddPeerWorker-5
Registers:
RIP: [0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
RDI: [0x0000000000000002]
RSI: [0x00007FDB121F9860]
RBP: [0x00007FDB37A74698]
RSP: [0x00007FDB121F9860]
RAX: [0x0000000000000000]
RBX: [0x0000000000000006]
RCX: [0x00007FDB3792137F]
RDX: [0x0000000000000000]
R8: [0x0000000000000000]
R9: [0x00007FDB121F9860]
R10: [0x0000000000000008]
R11: [0x0000000000000246]
R12: [0x0000555F4AA9B818]
R13: [0x0000555F4A93BC02]
R14: [0x00000000000003C2]
R15: [0x00007FDB16506238]
EFL: [0x0000000000000246]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x002B000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace (PIC build):
[0x00007FDB3792137F] gsignal + 271 (libc.so.6 + 0x3737F)
[0x00007FDB3790BDB5] abort + 295 (libc.so.6 + 0x21DB5)
[0x00007FDB3790BC89] ? (libc.so.6 + 0x21C89)
[0x00007FDB37919A76] ? (libc.so.6 + 0x2FA76)
[0x0000555F497B294F] _ZN8CMBucket14setRASummariesERK4GuidRKSt3mapI3Str15CMBucketSummarySt4lessIS4_ESaISt4pairIKS4_S5_EEE + 623 (splunkd + 0x28C694F)
[0x0000555F496C13C8] _ZN15CMAddPeerWorker15finishAddBucketERP8CMBucketR15BucketCSVStruct + 136 (splunkd + 0x27D53C8)
[0x0000555F496C2320] _ZN15CMAddPeerWorker19addStandaloneBucketERK13IndexDataTypeR15BucketCSVStruct + 128 (splunkd + 0x27D6320)
[0x0000555F496C24B3] _ZN15CMAddPeerWorker20processBucketBatchesEv + 291 (splunkd + 0x27D64B3)
[0x0000555F48757588] _ZN15CMAddPeerWorker4mainEv + 552 (splunkd + 0x186B588)
[0x0000555F4959B917] _ZN6Thread8callMainEPv + 135 (splunkd + 0x26AF917)
[0x00007FDB37CB717A] ? (libpthread.so.0 + 0x817A)
[0x00007FDB379E6DC3] clone + 67 (libc.so.6 + 0xFCDC3)
Linux / splunk-master-prod-01.local.ad / 4.18.0-240.1.1.el8_3.x86_64 / #1 SMP Fri Oct 16 13:36:46 EDT 2020 / x86_64
Libc abort message: splunkd: /opt/splunk/src/clustering/CMBucket.cpp:962: void CMBucket::setRASummaries(const Guid&, const CMBucketSummaries&): Assertion `hasPeer(peer)' failed.
/etc/redhat-release: Red Hat Enterprise Linux release 8.5 (Ootpa)
glibc version: 2.28
glibc release: stable
Last errno: 0
Threads running: 103
Runtime: 56.398836s
argv: [splunkd --under-systemd --systemd-delegate=yes -p 8089 _internal_launch_under_systemd]
Regex JIT enabled
RE2 regex engine enabled
using CLOCK_MONOTONIC
Thread: "CMAddPeerWorker-5", did_join=0, ready_to_run=Y, main_thread=N, token=140578878629632
MutexByte: MutexByte-waiting={none}
x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000306F0 07040800 FFFA3203 1F8BFBFF
2: 76036301 00F0B5FF 00000000 00C30000
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000000 00000000 00000000 00000000
6: 00000004 00000000 00000000 00000000
7: 00000000 00000000 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 000000FF 00000000 00000000
B: 00000000 00000000 00000047 00000007
C: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000021 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 30383632 20347620
80000004: 2E322040 48473034 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 0000302B 00000000 00000000 00000000
terminating...
And indexer-1 (that one that was rebooted) cannot join to cluster.
Has anyone had this problem and how to deal with it?
If more info needed im able to send it.
... View more