Reporting

Unable to start hunk, failed with crash report

abacus_machine_
Engager
abc@abc /opt/hunk/bin $ sudo ./splunk start

Splunk> Finding your faults, just like mom.

Checking prerequisites...
    Checking http port [8000]: open
    Checking mgmt port [8089]: open
    Checking appserver port [127.0.0.1:8065]: open
    Checking kvstore port [8191]: open
    Checking configuration...  Done.
    Checking critical directories...    Done
    Checking indexes...
        Validated: _audit _blocksignature _internal _introspection _thefishbucket history main summary
    Done
    Checking filesystem compatibility...  Done
    Checking conf files for problems...
    Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done


Waiting for web server at http://127.0.0.1:8000 to be availablesplunkd 5754 was not running.
Stopping splunk helpers...

Done.
Stopped helpers.
Removing stale pid file... done.


WARNING: web interface does not seem to be available!

Following is the crash report generated in log directory of Splunk's.

   [build 237464] 2015-01-07 18:31:12
    Received fatal signal 6 (Aborted).
     Cause:
       Signal sent by PID 5371 running under UID 0.
     Crashing thread: IndexerTPoolWorker-1
     Registers:
        RIP:  [0x00007FE2AE2ACBB9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
        RDI:  [0x00000000000014FB]
        RSI:  [0x000000000000150D]
        RBP:  [0x00007FE2A77FE9B0]
        RSP:  [0x00007FE2A77FE7C8]
        RAX:  [0x0000000000000000]
        RBX:  [0x00007FE2A683F1C0]
        RCX:  [0xFFFFFFFFFFFFFFFF]
        RDX:  [0x0000000000000006]
        R8:  [0x00007FE2AE6379D0]
        R9:  [0x0000000001612EEA]
        R10:  [0x0000000000000008]
        R11:  [0x0000000000000202]
        R12:  [0x00007FE2A683F1E0]
        R13:  [0x00007FE2A683C0C0]
        R14:  [0x00007FE2A683C0F8]
        R15:  [0x00007FE2A683C0F0]
        EFL:  [0x0000000000000202]
        TRAPNO:  [0x0000000000000000]
        ERR:  [0x0000000000000000]
        CSGSFS:  [0x0000000000000033]
        OLDMASK:  [0x0000000000000000]

     OS: Linux
     Arch: x86-64

     Backtrace:
      [0x00007FE2AE2ACBB9] gsignal + 57 (/lib/x86_64-linux-gnu/libc.so.6)
      [0x00007FE2AE2AFFC8] abort + 328 (/lib/x86_64-linux-gnu/libc.so.6)
      [0x00000000015BA6C5] _ZN9__gnu_cxx27__verbose_terminate_handlerEv + 245 (splunkd)
      [0x0000000001574BB6] _ZN10__cxxabiv111__terminateEPFvvE + 6 (splunkd)
      [0x0000000001574BE3] ? (splunkd)
      [0x0000000001575F4E] ? (splunkd)
      [0x0000000000A29789] _ZN24DatabaseDirectoryManager20locked_scanDirectoryERKSt3mapI10CMBucketIdNS_6BucketESt4lessIS1_ESaISt4pairIKS1_S2_EEERK8Pathnameb + 1865 (splunkd)
      [0x0000000000A2989D] _ZN24DatabaseDirectoryManager22locked_scanDirectoriesEv + 77 (splunkd)
      [0x0000000000A2BE70] _ZN24DatabaseDirectoryManager29refreshBucketManifest_startupEv + 48 (splunkd)
      [0x0000000000A2C098] _ZN24DatabaseDirectoryManagerC1ERK8PathnameS2_S2_S2_bRK3Str + 392 (splunkd)
      [0x00000000009FF77B] _ZN23DatabasePartitionPolicy48openDatabases_ensureInitialized_directoryManagerEv + 107 (splunkd)
      [0x0000000000A0DA78] _ZN23DatabasePartitionPolicy13openDatabasesEbb + 56 (splunkd)
      [0x0000000000A0DDE5] _ZN23DatabasePartitionPolicy5startEbb + 181 (splunkd)
      [0x0000000000D3F899] _ZN6Worker4mainEv + 57 (splunkd)
      [0x0000000000F4FA7E] _ZN6Thread8callMainEPv + 62 (splunkd)
      [0x00007FE2AE644182] ? (/lib/x86_64-linux-gnu/libpthread.so.0)
      [0x00007FE2AE370EFD] clone + 109 (/lib/x86_64-linux-gnu/libc.so.6)
     Linux / abacus-ThinkPad-W540 / 3.13.0-24-generic / #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 / x86_64
     Last few lines of stderr (may contain info on assertion failure, but also could be old):
        2014-12-18 12:22:10.227 +0530 Interrupt signal received
        2015-01-07 18:30:50.058 +0530 splunkd started (build 237464)
        terminate called after throwing an instance of 'DatabaseDirectoryManagerException'
          what():  idx=_audit bucket=db_1418731414_1418730253_17 Detected directory manually copied into its database, causing id conflicts [path1='/opt/hunk/var/lib/splunk/audit/db/hot_v1_17' path2='/opt/hunk/var/lib/splunk/audit/db/db_1418731414_1418730253_17'].terminate called recursively
        terminate called recursively
        terminate called recursively
        2015-01-07 18:31:12.523 +0530 splunkd started (build 237464)
        terminate called after throwing an instance of 'DatabaseDirectoryManagerException'
        terminate called recursively
        terminate called recursively

     /etc/debian_version: jessie/sid
    Last errno: 0
    Threads running: 17
    argv: [splunkd -p 8089 start]
    Thread: "IndexerTPoolWorker-1", did_join=0, ready_to_run=Y, main_thread=N
    First 8 bytes of Thread token @0x7fe2a7830790:
    00000000  00 f7 7f a7 e2 7f 00 00                           |........|
    00000008
    TPool Worker: _shouldJoinAndDelete=N, _id=1
    Running TJob: name=TJob


    x86 CPUID registers:
             0: 0000000D 756E6547 6C65746E 49656E69
             1: 000306C3 05100800 7FDAFBBF BFEBFBFF
             2: 76036301 00F0B5FF 00000000 00C10000
             3: 00000000 00000000 00000000 00000000
             4: 00000000 00000000 00000000 00000000
             5: 00000040 00000040 00000003 00042120
             6: 00000077 00000002 00000009 00000000
             7: 00000000 00000000 00000000 00000000
             8: 00000000 00000000 00000000 00000000
             9: 00000000 00000000 00000000 00000000
             A: 07300403 00000000 00000000 00000603
             B: 00000000 00000000 000000FF 00000005
             C: 00000000 00000000 00000000 00000000
             😧 00000000 00000000 00000000 00000000
      80000000: 80000008 00000000 00000000 00000000
      80000001: 00000000 00000000 00000021 2C100800
      80000002: 65746E49 2952286C 726F4320 4D542865
      80000003: 37692029 3037342D 20514D30 20555043
      80000004: 2E322040 48473034 0000007A 00000000
      80000005: 00000000 00000000 00000000 00000000
      80000006: 00000000 00000000 01006040 00000000
      80000007: 00000000 00000000 00000000 00000100
      80000008: 00003027 00000000 00000000 00000000
    terminating...

How do I repair this errors?
Yes I copied whole Hunk directory from some other physical location.
I tried rebuilding the indexes

splunk rebuild <bucket directory>

but no luck thrown with following output.

USAGE: splunk rebuild <bucketPath> [<indexName>] [--no-log]
The <indexName> parameter is ignored if provided.
Please see 'splunk fsck' for more options.  This command is just a wrapper for 'splunk fsck'.

Redirecting to 'splunkd fsck' with args:
    repair --one-bucket --include-hots --bucket-path=../var/lib/splunk/audit/db/hot_v1_19/ --log-to--splunkd-log 
No bootstrap configuration available for: /etc
WARN  Fsck - Not loading indexes.conf; will proceed with all defaults
WARN  ServerConfig - No value found for listenOnIPv6 setting.  Using the default value of "no"
WARN  ServerConfig - No value found for connectUsingIpVersion setting.  Using the default value of "auto"
ERROR ServerConfig - Found no server name in server.conf.  Please set it.  Will attempt to use default for now.
WARN  ServerConfig - No web configuration present, assuming defaults.
WARN  ServerConfig - No SSL configuration present, assuming SSL using defaults.
ERROR IndexConfig - stanza=default Required parameter=blockSignatureDatabase not configured
WARN  BucketBuilder - Could not read indexes.conf, using bucketRebuildMemoryHint=33554432 (MB=32.000000)
INFO  BucketBuilder - Could not parse server/[diskUsage]/minFreeSpace, defaulting to 2048
INFO  Fsck - (entire bucket) Rebuild for bucket='/opt/hunk/var/lib/splunk/audit/db/hot_v1_19' took 135.8 milliseconds
Tags (4)
0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

It seems like somehow the audit index was corrupted. If you don't care about its content one quick solution would be to move the entire directory away - ie mv /opt/hunk/var/lib/splunk/audit / tmp/ and restart. You might have to do this for other indexes if they're corrupt too ... Since Hunk doesn't store any data locally on these indexes you should be OK moving them out, but if you do care about their content please let us know and we can dig a bit deeper

0 Karma

abacus_machine_
Engager

I agree Hunk does not store data locally but in my case I have some events in "main" index. Any other way to repair this corrupt index?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

In the failure you showed it was only the audit index (ie index=_audit) that had a problem - are you saying that the main index (defaultdb) also has the same issues?

0 Karma

abacus_machine_
Engager

Moving audit index alone did't help but moving _interospection as well _internaldb, made it working. But I think this is not a optimal way to fix, we must find a way to recover this corrupt index for inspection purpose. In some situations we can't afford to loose _internal db.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...