splunkd crash during startup - Assertion `bytesToH...

jcagle · ‎02-17-2013

This crash is happening every time I try to start splunkd after a new install of splunk 5.0.2 build 149561 on SLES11 SP1 x86_64. However, this same build works fine on SLES11 SP2. I have all the latest kernel updates installed on SP1, by the way.

[build 149561] 2013-02-17 19:24:28
Received fatal signal 6 (Aborted).
 Cause:
   Signal sent by PID 41950 running under UID 0.
 Crashing thread: MainTailingThread
 Registers:
    RIP:  [0x00007FBF56AD5945] gsignal + 53 (/lib64/libc.so.6)
    RDI:  [0x000000000000A3DE]
    RSI:  [0x000000000000A40B]
    RBP:  [0x0000000001306DE0]
    RSP:  [0x00007FBF4AFFE2B8]
    RAX:  [0x0000000000000000]
    RBX:  [0x00007FBF56BC55E0]
    RCX:  [0xFFFFFFFFFFFFFFFF]
    RDX:  [0x0000000000000006]
    R8:  [0x00000000FFFFFFFF]
    R9:  [0x00007FBF56DFCE20]
    R10:  [0x0000000000000008]
    R11:  [0x0000000000000206]
    R12:  [0x00007FFFCC3AC69B]
    R13:  [0x00007FBF56BC55E0]
    R14:  [0x0000000001307930]
    R15:  [0x00000000000000E5]
    EFL:  [0x0000000000000206]
    TRAPNO:  [0x0000000000000000]
    ERR:  [0x0000000000000000]
    CSGSFS:  [0x0000000000000033]
    OLDMASK:  [0x0000000000000000]

 OS: Linux
 Arch: x86-64

 Backtrace:
  [0x00007FBF56AD5945] gsignal + 53 (/lib64/libc.so.6)
  [0x00007FBF56AD6F21] abort + 385 (/lib64/libc.so.6)
  [0x00007FBF56ACE810] __assert_fail + 240 (/lib64/libc.so.6)
  [0x00000000006FCD42] _ZN16FileInputTracker10computeCRCEPm14FileDescriptorRK3Strll + 1906 (splunkd)
  [0x00000000006FCE71] _ZN16FileInputTracker11fileHalfMd5EPm14FileDescriptorRK3Strll + 17 (splunkd)
  [0x000000000071B844] _ZN3WTF13loadFishStateEb + 644 (splunkd)
  [0x000000000070A6C5] _ZN10TailReader8readFileER15WatchedTailFileP11TailWatcher + 149 (splunkd)
  [0x000000000070A8E4] _ZN11TailWatcher8readFileER15WatchedTailFile + 260 (splunkd)
  [0x000000000070C9FB] _ZN11TailWatcher11fileChangedEP16WatchedFileStateRK7Timeval + 363 (splunkd)
  [0x0000000000D3F4E1] _ZN30FilesystemChangeInternalWorker15callFileChangedER7TimevalP16WatchedFileState + 113 (splunkd)
  [0x0000000000D40DCF] _ZN30FilesystemChangeInternalWorker12when_expiredERy + 479 (splunkd)
  [0x0000000000DA5553] _ZN11TimeoutHeap18runExpiredTimeoutsER7Timeval + 227 (splunkd)
  [0x0000000000D3A318] _ZN9EventLoop3runEv + 216 (splunkd)
  [0x000000000071328F] _ZN11TailWatcher3runEv + 143 (splunkd)
  [0x00000000007133EB] _ZN13TailingThread4mainEv + 267 (splunkd)
  [0x0000000000DA2F32] _ZN6Thread8callMainEPv + 66 (splunkd)
  [0x00007FBF58269696] ? (/lib64/libpthread.so.0)
  [0x00007FBF56B77D7D] clone + 109 (/lib64/libc.so.6)
 Linux / dl380-ion1 / 2.6.32.59-0.7-default / #1 SMP Fri Dec 28 20:16:13 zzz 2012 / x86_64
 Last few lines of stderr (may contain info on assertion failure, but also could be old):
    2013-02-17 19:24:26.776 +0000 splunkd started (build 149561)
    splunkd: /opt/splunk/p4/splunk/branches/5.0.2/src/pipeline/input/FileInputTracker.cpp:229: static bool FileInputTracker::computeCRC(uint64_t*, FileDescriptor, const Str&, file_offset_t, file_offset_t): Assertion `bytesToHash < 1048576' failed.

 /etc/SuSE-release: SUSE Linux Enterprise Server 11 (x86_64)
 glibc version: 2.11.1
 glibc release: stable
Threads running: 24
argv: [splunkd -p 8089 start]
terminating...

sarmstrong_splu · ‎03-05-2013

Possible workaround for this bug:

Check your props.conf files on the instance that is crashing due to this bug, and if there are any 'CHECK_METHOD = modtime or entire_md5, comment them out and restart the instance. Be sure to check under the app contexts as well. A customer found one under the *nix app, and that was the only occurrence he found (not under the default, that is). After commenting it out, it started up as expected.

(The fix for SPL-58292 is expected to come in an upcoming maintenance release)

Drainy · ‎02-17-2013

This is a known issue, bug SPL-58292

MainTailingThread crashes splunkd with a message that says 'Assertion failed: bytesToHash < 1048576' (SPL-58292)

Whenever I hit a new problem I always hit the known issues first to be safe;
http://docs.splunk.com/Documentation/Splunk/5.0.2/ReleaseNotes/Knownissues

In the meantime, you should contact Splunk support for more help

Drainy · ‎02-17-2013

Even without a support contract you can still submit a support request via https://www.splunk.com/index.php/submit_issue The only difference is that you have no guaranteed SLA but someone at Splunk will eventually read it. Best bet is to include the bug detail in the subject to grab their attention

jcagle · ‎02-17-2013

Hi Drainy, Thanks for your reply. Yes, I know there's a similar issue (and I should have stated that in my posting), but sometimes it helps the engineers debug when they have more than one datapoint, hence my report here on SLES11 SP1. Also, I don't have a support contract (yet), so there's no way to contact Splunk support. This is my first day using the evaluation software. Glad it works on SP2 tho.

splunkd crash during startup - Assertion `bytesToHash < 1048576' failed

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes