Splunk Search

Splunk unable to start and emits "Conf is currently being modified by process ####."

MidGe
Explorer

This morning after rebooting my computer with splunk on it, Splunk refuses to start.

Trying to investigate the problem, I found a few odd things. The most likely error is identified by a message [Conf is currently being modified by process 4432] which occurs on a number of attempts at starting splunk, or trying to check licence via cli, for instance. The strange thing is that there does not seem to be a process 4432 running on my computer!

Has Splunk got corrupted somehow?

Here is an odd extract from the logs:
01-28-2013 02:30:55.412 +0800 INFO LicenseMgr - Initing LicenseMgr runContext_splunkd=false
01-28-2013 02:30:55.412 +0800 INFO LMStackMgr - closing stack mgr
01-28-2013 02:30:55.412 +0800 INFO LMSlaveInfo - all slaves cleared
01-28-2013 02:30:55.422 +0800 INFO LMStackMgr - created stack='download-trial'
01-28-2013 02:30:55.422 +0800 INFO LMStackMgr - have to auto-set active stack group='Trial' reason='invalid/missing group id' gidStr='' oldGid=Invalid

I start splunk via the CLI as:

"$ sudo /opt/splunk/bin/splunk start"

I then get the following:

"Splunk> Winning the War on Error

Checking prerequisites...
Checking http port [8000]: open
Checking mgmt port [8089]: open
Checking configuration... Done.
Checking indexes...
Validated databases: _audit _blocksignature _internal _thefishbucket history main os sos sos_summary_daily summary
Done
Checking filesystem compatibility... Done
Checking conf files for typos... Done
All preliminary checks passed.

Conf is currently being modified by process 4432.
Conf is currently being modified by process 4432.
Conf is currently being modified by process 4432.
Conf is currently being modified by process 4432.
Conf is currently being modified by process 4432.
Conf is currently being modified by process 4432.
Starting splunk server daemon (splunkd)...

Timed out waiting for splunkd to start.
Starting splunkweb... Done

If you get stuck, we're here to help.

Look for answers here: http://docs.splunk.com

The Splunk web interface is at http://wolfgang:8000"

and on attempting to reach splunk via the web interface, I get:

"The splunkd daemon cannot be reached by splunkweb. Check that there are no blocked network ports or that splunkd is still running."

With the following at the bottom of the screen:

"You are using wolfgang:8000, which is connected to splunkd @000 at https://127.0.0.1:8089 on Mon Jan 28 04:52:13 2013."

The @000 seems a bit odd, no?

Tags (3)
1 Solution

yannK
Splunk Employee
Splunk Employee

I wonder if you do not have some file owned by a different user than the one running splunk

  • 1 stop splunk
  • 2 check the presence and owner/permissions on $SPLUNK_HOME/var/run/splunk/*.pid
  • 3 delete them if they still exists
  • 4 if needed do a chown -R for the splunk folders to change the owner
  • 5 start splunk under the correct user
  • 6 double check that the service starts with the correct user

View solution in original post

vvereschaka
Engager

in additional to yannK answer.

remove the following pid files from $SPLUNK_HOME/var/run/splunk if they exist
- splunkd.pid.corrupt
- conf-mutator.pid

and start splunk/splunkforwarder

jrodman
Splunk Employee
Splunk Employee

splunkd.pid.corrupt is not used. That comes to exist if splunk does not trust the contents of the file splunkd.pid. splunkd will rename the splunkd.pid to splunkd.pid.corrupt if it believes it's broken (such as containing process id 1, or not containing an identifiable sequence of numbers).

0 Karma

the_wolverine
Champion

Removing conf-mutator.pid and restarting worked for me.

0 Karma

MidGe
Explorer

Hiya yannK,

Thanks for your reply. This fixed the problem.
I only had to delete the *.pid file which was created around the time the problem started. That file contained a single line entry which was "4432". That is the non-exixtent process that Splunk was complaining about and why it refused to start.

BTW tat file was owned by root with root user having the only permissions on it.

Now, to satisfy my curiosity and my paranoiac tendencies, what could be the cause of such behaviour? Should I be concerned about a possible intrusion in my system? Alternatively what can I do to mitigate the possibility of a recurrence?

Thanks a lot for the answer that did solve my problem.

0 Karma

yannK
Splunk Employee
Splunk Employee

If splunk run as root it's normal that the file is owned by root, but I expect the file to be deleted after a clean stop.
Maybe the process crashed too quickly or a lock stayed on the file.

Double check under which user splunk is running, and check /var/log/messages to see if the process was not killed by the system for OOM.

yannK
Splunk Employee
Splunk Employee

I wonder if you do not have some file owned by a different user than the one running splunk

  • 1 stop splunk
  • 2 check the presence and owner/permissions on $SPLUNK_HOME/var/run/splunk/*.pid
  • 3 delete them if they still exists
  • 4 if needed do a chown -R for the splunk folders to change the owner
  • 5 start splunk under the correct user
  • 6 double check that the service starts with the correct user

imanpoeiri
Communicator

Hi @yannK, it seems not applicable for me. I still cant bring up the splunk web interface. Any other advise?

In addition, i found this under */log/splunk:

[build 255606] 2015-07-01 11:03:45
C++ exception: object@[0x000000000692ED80], type@[0x0000000140FAEF60]
Exception is Non-continuable
Exception address: [0x000007FEFDF3ADCD]
Crashing thread: IndexerTPoolWorker-0
MxCsr: [0x0000000000001F80]
SegDs: [0x000000000000002B]
SegEs: [0x000000000000002B]
SegFs: [0x0000000000000053]
SegGs: [0x000000000000002B]
SegSs: [0x000000000000002B]
SegCs: [0x0000000000000033]
EFlags: [0x0000000000000202]
Rsp: [0x000000000692EB70]
Rip: [0x000007FEFDF3ADCD] RaiseException + 61/80
Dr0: [0x0000000000000000]
Dr1: [0x0000000000000000]
Dr2: [0x000000000692E628]
Dr3: [0x0000000000000000]
Dr6: [0x000007FEDA2F24A3]
Dr7: [0x0000000000000000]
Rax: [0x000000006113A801]
Rcx: [0x000000000692E560]
Rdx: [0x00000000000000D0]
Rbx: [0x0000000140FAEF60]
Rbp: [0x000000000692ECA0]
Rsi: [0x00000000039DC248]
Rdi: [0x00000000039EE290]
R8: [0x0000000000000000]
R9: [0x0000000000000000]
R10: [0x000000013F470000]
R11: [0x000000000692EBB0]
R12: [0x000000000692F3D8]
R13: [0x00000000039DC1C0]
R14: [0x0000000000000000]
R15: [0x00000000039DC1C0]
DebugControl: [0xFFFFFFFFFFFFFFFE]
LastBranchToRip: [0x0000000000000018]
LastBranchFromRip: [0x0000000000000000]
LastExceptionToRip: [0x0000000005490010]
LastExceptionFromRip: [0x000000013FFC3380]

OS: Windows
Arch: x86-64

Backtrace:
[0x000007FEFDF3ADCD] RaiseException + 61/80
Args: [0x0000000140FAEF60] [0x000000000692ECB0] [0x0000000000000001]
[0x000007FEDA31E92C] CxxThrowException + 212/1124
Args: [0x000000013F470000] [0x0000000000000108] [0x0000000000000108]
[0x000000013F69B74F] ?
Args: [0x0000000000000000] [0x00000000039DD120] [0x000033A054A45C0A]
[0x000000013F69AE53] ?
Args: [0x000000000692F480] [0x0000000003993C48] [0x00000000039DC280]
[0x000000013F69E5E3] ?
Args: [0x0000000003990AF0] [0x00000000039DC280] [0x00000000039DC280]
[0x000000013F69340E] ?
Args: [0x0000000000000000] [0x00000000043B9980] [0x0000000003987AB0]
[0x000000013F6CC427] ?
Args: [0x00000000039DC1C0] [0x0000000003987AB0] [0x0000000000000800]
[0x000000013F6CBE56] ?
Args: [0x0000000000000001] [0x0000000003987AB0] [0x0000000000000000]
[0x000000013F6D2D4B] ?
Args: [0x0000000004382018] [0x00000000043A96B0] [0x0000000000000000]
[0x000000013FA9A668] ?
Args: [0x00000000043A96B0] [0x00000000043B9980] [0x00000000024D0F10]
[0x000000013FC69BFC] ?
Args: [0x00000000043B9980] [0x000007FEDA2E432B] [0x0000000000000000]
[0x000000013F4A97C7] ?
Args: [0x00000000024D0F10] [0x0000000000000000] [0x0000000000000000]
[0x000007FEDA2E3FEF] beginthreadex + 263/284
Args: [0x000007FEDA381DB0] [0x0000000000000000] [0x0000000000000000]
[0x000007FEDA2E4196] endthreadex + 402/404
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
[0x00000000770E59DD] BaseThreadInitThunk + 13/96
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
[0x00000000777FA651] RtlUserThreadStart + 33/1024
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
Crash dump written to: C:\Program Files\Splunk\var\log\splunk\C__Program Files_Splunk_bin_splunkd_exe_crash-2015-07-01-11-03-45.dmp

Splunk ran as local administratorMW7GDTHS0E4NJD /6.1 Service Pack 1
GetLastError(): 0
Threads running: 14
Executable module base: 0x000000013F470000
argv: [Splunkd -p 8089]
Thread: "IndexerTPoolWorker-0", did_join=0, ready_to_run=Y, main_thread=N
First 4 bytes of Thread token @00000000043B9994:
00000000 5c 2f 00 00 |\/..|
00000004
TPool Worker: _shouldJoinAndDelete=N, _id=0
Running TJob: name=TJob

x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000306A9 01100800 7FBAE3FF BFEBFBFF
2: 76035A01 00F0B2FF 00000000 00CA0000
3: 00000000 00000000 00000000 00000000
4: 1C004121 01C0003F 0000003F 00000000
5: 00000040 00000040 00000003 00021120
6: 00000077 00000002 00000009 00000000
7: 00000000 00000281 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300403 00000000 00000000 00000603
B: 00000001 00000002 00000100 00000001
C: 00000000 00000000 00000000 00000000
😧 00000007 00000340 00000340 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 28100800
80000002: 20202020 49202020 6C65746E 20295228
80000003: 65726F43 294D5428 2D356920 30323333
80000004: 5043204D 20402055 30362E32 007A4847
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 00003024 00000000 00000000 00000000
terminating...

0 Karma

jrodman
Splunk Employee
Splunk Employee

Hopefully, in 6.1.4+ / 6.2+ manually deleting pid files should not be necessary.
If it is, please do a little investigation of the system state, file contents, etc and file a bug.

0 Karma

JeffSchumacher
Engager

Using 6.3.0, and manually deleting the conf-mutator.pid fixed the same problem for me.

0 Karma

MidGe
Explorer

I checked that there was no process 4432 doing a "ps aux".

Secondly, even after areboot it is still complaining about the same process 4432! It seems to me that Splunk has a variable set to 4432 somewhere that is persistent between reboots and restarts.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Very early in the life of conf-mutator.pid (5.x), the way the pid was tested would return true for any running THREAD. In modern linux, threads and processes IDs live in the same number space, so you can accidentally find a thread depending how you are testing for processes. You could have checked for a thread with ps auxH because.. H means .. tHread? or something?

0 Karma

Drainy
Champion

when you say that there is no process 4432 running, how are you checking for this?

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The @000 suggests that splunkweb isn't connected to splunkd, probably due to splunkd not having started.

0 Karma

MidGe
Explorer

Thanks for your attention to this.

I edited my original post as comment on your question did not allow enough characters.

0 Karma

Drainy
Champion

Sorry, just to be clear. What happens when you try to start Splunk and how are you starting Splunk?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...