Monitoring Splunk

How to determine why Splunk 6.2.1 shut down unexpectedly?

sc0tt
Builder

Splunk (6.2.1) unexpectedly shut down and needed to be restarted. There were no issues with the server so the issue seems to be specific to Splunk. Is there a way to determine the root cause? Any specific log entries that should be checked?

0 Karma
1 Solution

sc0tt
Builder

Thanks for all the suggestions. It seems that it may be a memory usage related to the fact that Splunk 6.2.1 ignores user time zone setting for cron scheduled searches and runs cron scheduled searches in relation to system time instead. We have many saved searches scheduled across multiple time zones but now they are all running at the same EST time which is using more resources.

View solution in original post

0 Karma

damionsteen
New Member

I ran into the same issue:

Feb 20 15:46:09 DCTM1 kernel: Out of memory: Kill process 27016 (splunkd) score 193 or sacrifice child
Feb 20 15:46:09 DCTM1 kernel: Killed process 27016 (splunkd) total-vm:6018500kB, anon-rss:2398292kB, file-rss:1460kB, shmem-rss:0kB
Feb 20 15:46:09 DCTM1 kernel: splunkd: page allocation failure: order:0, mode:0x201da
Feb 20 15:46:09 DCTM1 kernel: CPU: 1 PID: 27016 Comm: splunkd Not tainted 3.10.0-514.el7.x86_64 #1,I ran into the same issue on Redhat:

Feb 20 15:46:09 DCTM1 kernel: Out of memory: Kill process 27016 (splunkd) score 193 or sacrifice child
Feb 20 15:46:09 DCTM1 kernel: Killed process 27016 (splunkd) total-vm:6018500kB, anon-rss:2398292kB, file-rss:1460kB, shmem-rss:0kB
Feb 20 15:46:09 DCTM1 kernel: splunkd: page allocation failure: order:0, mode:0x201da
Feb 20 15:46:09 DCTM1 kernel: CPU: 1 PID: 27016 Comm: splunkd Not tainted 3.10.0-514.el7.x86_64 #1

0 Karma

sc0tt
Builder

Thanks for all the suggestions. It seems that it may be a memory usage related to the fact that Splunk 6.2.1 ignores user time zone setting for cron scheduled searches and runs cron scheduled searches in relation to system time instead. We have many saved searches scheduled across multiple time zones but now they are all running at the same EST time which is using more resources.

0 Karma

thomrs
Communicator

I had an issue where Red Hat was killing splunk b/c of memory. Messages where in /var/log/messages. As mentioned above _internal or even _* should help determine the cause. Look at the system activity dashboards us another good place to look.

esix_splunk
Splunk Employee
Splunk Employee

If the splunk instance is back up and running, run a search for "index=_internal" from the time range it crashed and start looking for events.

0 Karma

harshilmarvani1
New Member

Have you checked crash log in splunk log directory??

Check your ulimit as well, on Linux default ulimit is 1024. And splunk suggest alteast 8192 for user from splunkd is running.

0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...