Splunk Dev

500 Internal Server Error after upgrading to 6.3.x ?

szabados
Communicator

After upgrading my Splunk cluster to 6.3.1, I'm facing "500 Internal Server Error" all the time, after logging in to any of my Splunk instances (search head, deployment server ...)

My splunkd.log is full with lines like this:
WARN HttpListener - Can't handle request for /services/broker/connect/<peer id>, max thread limit for REST HTTP server is 2729, threads already in use is 2729

My web_service.log ends with things like this when the issue happens:

2015-11-30 08:21:49,790 DEBUG [565c071d9a4b27ac7a58] cplogging:55 - [30/Nov/2015:08:21:49] HTTP Traceback (most recent call last):
File "E:\Splunk\Python-2.7\Lib\site-packages\cherrypy_cprequest.py", line 606, in respond
cherrypy.response.body = self.handler()
File "E:\Splunk\Python-2.7\Lib\site-packages\cherrypy_cpdispatch.py", line 25, in __call
_
return self.callable(*self.args, **self.kwargs)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 38, in rundecs
return fn(*a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 118, in check
return fn(self, *a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 167, in validate_ip
return fn(self, *a, **kw)
File "<string>", line 1, in <lambda>
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 246, in preform_sso_check
update_session_user(sessionKey, remote_user)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\lib\decorators.py", line 189, in update_session_user
en = splunk.entity.getEntity('authentication/users', user, sessionKey=sessionKey)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\entity.py", line 249, in getEntity
serverResponse, serverContent = rest.simpleRequest(uri, getargs=kwargs, sessionKey=sessionKey, raiseAllErrors=True)
File "E:\Splunk\Python-2.7\Lib\site-packages\splunk\rest_init_.py", line 567, in simpleRequest
raise splunk.RESTException, (serverResponse.status, serverResponse.messages)
RESTException: [HTTP 503] General server error

Tags (3)
0 Karma

stevepraz
Path Finder

What OS are you running on? We are having a similar problem on Windows virtual servers running Splunk following our upgrade to 6.3. 6.3.1 didn't seem to help either. Recycling Splunk solves the issue temporarily but it comes back. We first had the issue on our Windows search head, once we rolled back to a 6.2 release the issue went away. Now we are also seeing the issue our on indexers that are 6.3 but rolling back isn't really an option there. Our Linux search head hasn't seen the issue.

The system appears to be running into it's own self-imposed max HTTP threads limit. When the server is in this condition it returns 500 errors and eventually fails health checks. However, logging onto the server shows that actual CPU, memory and other system vitals barely being used at all.

My first thought was that by overriding maxThreads in server.conf we could escape the issue but that doesn't solve it either. I'm thinking maybe something in 6.3 is either using more of these threads or not cleaning them up as fast. The problem is I don't see any way to measure it other than the messages that come up when you've hit the limit.

0 Karma

szabados
Communicator

Hi,

I've tried the same, and it seems it solved the issue for me.
Just a tip: there is a Splunk article about this, where the stanza name is with lowercase S in [httpserver].
In the server.conf spec, it is written with capital S [httpServer]. I've copied it for the first time with the lowercase S, and it didn't work, but after correcting it to uppercase, it solved the issue.

0 Karma

stevepraz
Path Finder

Thanks for the update. I stumbled into the same copy/paste issue you mentioned but later realized it. That fix appeared to approve my uptime but recycles still were required, just less frequently.

Since I updated this, I got more information on my case I originally opened and my sales engineer mentioned that there is definitely a bug opened for this specific issue that will be addressed in a future release.

My resolution was to migrate my indexers to Linux to restore stability to my environment.

0 Karma

mzorzi
Splunk Employee
Splunk Employee

You might just have a problem of resources. If your system is indeed matching the recommended then you should try again the upgrade, but first stop Splunk

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...