All Apps and Add-ons

SPLUNKD Agent crashing frequently

harneet85
New Member

Agents are installed on the 3 servers capturing WebSphere Application Server logs. Nothing has been changed in the environment except WAS JVM was patched and then restarted the JVM. After that every other day the agent gets crashed and I am not able to understand from the logs.. I dont think JVM Patching is relevant but still mentioned it. Posting a snippet from a recently crashed agent:

version Splunk 4.2.1 (build 98164)

was 7.0.0.31

OS Linux 2.6.18-348.3.1.el5 #1 SMP Tue Mar 5 13:19:32 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

LOG

Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 29658 running under UID 2441.
Crashing thread: MainTailingThread
Registers:
RIP: [0x0000003809030295] gsignal + 53 (/lib64/libc.so.6)
RDI: [0x00000000000073DA]
RSI: [0x00000000000073F4]
RBP: [0x00002B4911006940]
RSP: [0x00002B4911005628]
RAX: [0x0000000000000000]
RBX: [0x00002B49110056D0]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x0000000000000080]
R9: [0x0101010101010101]
R10: [0x0000000000000008]
R11: [0x0000000000000202]
R12: [0x00007FFF6C43AA7D]
R13: [0x0000000000F984E8]
R14: [0x000000000000021A]
R15: [0x0000000000F97D18]
EFL: [0x0000000000000202]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]

OS: Linux
Arch: x86-64

Backtrace:
[0x0000003809031D40] abort + 272 (/lib64/libc.so.6)
[0x0000003809029716] assert_fail + 246 (/lib64/libc.so.6)
[0x0000000000671673] _ZSt16
introsort_loopIN9_gnu_cxx17normal_iteratorIPP15WatchedTailFileSt6vectorIS3_SaIS3_EEEEl22WTFTimes
tampComparatorEvT_SA_T0_T1
+ 515 (splunkd)
[0x0000000000665C06] _ZN10TailReader11trimOpenFdsEv + 230 (splunkd)
[0x0000000000665FDA] _ZN10TailReader8readDataER15WatchedTailFileP11TailWatcher + 650 (splunkd)
[0x0000000000666448] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcher + 184 (splunkd)
[0x0000000000666592] _ZN11TailWatcher8readFileER15WatchedTailFile + 82 (splunkd)
[0x0000000000668419] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 1161 (splunkd)
[0x0000000000B6DAFC] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 428 (splunkd)
[0x0000000000BB8E03] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 227 (splunkd)
[0x0000000000B673D8] _ZN9EventLoop3runEv + 216 (splunkd)
[0x000000000066B7AF] _ZN11TailWatcher3runEv + 143 (splunkd)
[0x000000000066D7C1] _ZN13TailingThread4mainEv + 289 (splunkd)
[0x0000000000BB6892] _ZN6Thread8callMainEPv + 66 (splunkd)
[0x0000003809C0683D] ? (/lib64/libpthread.so.0)
[0x00000038090D500D] clone + 109 (/lib64/libc.so.6)
Linux / alt-met-asol-was03 / 2.6.18-348.3.1.el5 / #1 SMP Tue Mar 5 13:19:32 EST 2013 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2013-02-28 13:30:03.674 +0100 splunkd started (build 98164)
2013-04-15 13:46:42.551 +0200 Interrupt signal received
2013-04-15 13:50:48.847 +0200 splunkd started (build 98164)
2013-08-29 04:35:38.328 +0200 splunkd started (build 98164)
2014-02-24 11:45:17.483 +0100 splunkd started (build 98164)
2014-02-26 15:10:11.065 +0100 splunkd started (build 98164)
2014-03-03 11:45:29.363 +0100 Interrupt signal received
2014-03-03 11:45:38.980 +0100 splunkd started (build 98164)
2014-03-05 10:07:33.392 +0100 splunkd started (build 98164)
splunkd: /opt/splunk/p4/splunk/branches/hammer/src/pipeline/input/Tailing.cpp:538: bool WTFTimestampComparator::operator()(con
st WatchedTailFile*, const WatchedTailFile*): Assertion `!x->getLastEOFTime().isZero() && !y->getLastEOFTime().isZero()' failed.

/etc/redhat-release: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
glibc version: 2.5
glibc release: stable
Threads running: 25
argv: [splunkd -p 8090 restart]
terminating...

0 Karma

ekost
Splunk Employee
Splunk Employee

This looks like an old bug in the 4.2.x. Upgrade to at least the last 4.3.x revision or newer and check that there are sufficient file descriptors for splunkd under the running credentials.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...