Getting Data In

splunkd Process at 100%, parsingQueue at 1000, how do I determine where the issue lies?

stephanbuys
Path Finder

As per another topic on "answers" I executed the following search:

index=_internal source=metrics.log group=queue | timechart perc95(current_size) by name

This confirms that my parsingqueue is almost always at 1000, which would probably explain why I have one splunkd process constantly using 100% of 1 out of 4 CPU's.

I am also receiving the following sequence of errors every 300ms from the splunkd.log, it might be a coincidence, it might be the cause.

02-22-2011 19:08:59.772 ERROR TcpInputProc - Received unexpected 68021378 byte message! from hostname=txxxxxxxxxx, ip=10.xxxxxxxx, port=45384

02-22-2011 19:08:59.772 INFO  TcpInputProc - Hostname=txxxxxxxxxxxx closed connection

02-22-2011 19:08:59.855 INFO  TcpInputProc - Connection in cooked mode from txxxxxxxxxxxx

02-22-2011 19:08:59.913 INFO  TcpInputProc - Valid signature found

02-22-2011 19:08:59.913 INFO  TcpInputProc - Connection accepted from txxxxxxxxxxx

Is it possible that some input from a forwarder keeps getting reprocessed?

Any pointers truly welcome.

1 Solution

jrodman
Splunk Employee
Splunk Employee

The TcpInputProc errors you are seeing are mangled or invalid input on a splunktcp input. It might not be splunk at all, but something else connecting to that socket. If so, you could quiesce the source program, or firewall the access.

Alternatively that might be a quite old 4.0.x /3.4.x forwarder which is doing bad things with heartbeats. If it is a splunk forwarder, make sure it is running a relatively current version.

Splunk using 100% cpu is not so odd, if it has work to do. If it is getting behind, then it may be useful to look at cpu time by processor in metrics to see where most of the time is being spent.

Indexing can get behind by bottlenecks of disk write speed, or cpu exhaustion. I'd use system tools to get an idea about these (top, iostat). Then dig in further along those lines.

This probably becomes a support case, but you can get started if you want, with links like:

http://www.splunk.com/wiki/Community:PerformanceTroubleshooting

http://www.splunk.com/wiki/Deploy:Troubleshooting

View solution in original post

jrodman
Splunk Employee
Splunk Employee

The TcpInputProc errors you are seeing are mangled or invalid input on a splunktcp input. It might not be splunk at all, but something else connecting to that socket. If so, you could quiesce the source program, or firewall the access.

Alternatively that might be a quite old 4.0.x /3.4.x forwarder which is doing bad things with heartbeats. If it is a splunk forwarder, make sure it is running a relatively current version.

Splunk using 100% cpu is not so odd, if it has work to do. If it is getting behind, then it may be useful to look at cpu time by processor in metrics to see where most of the time is being spent.

Indexing can get behind by bottlenecks of disk write speed, or cpu exhaustion. I'd use system tools to get an idea about these (top, iostat). Then dig in further along those lines.

This probably becomes a support case, but you can get started if you want, with links like:

http://www.splunk.com/wiki/Community:PerformanceTroubleshooting

http://www.splunk.com/wiki/Deploy:Troubleshooting

jrodman
Splunk Employee
Splunk Employee

Glad to hear it is fixed! Sorry it is tricky to handle investigation cases in splunk answers.

0 Karma

stephanbuys
Path Finder

Your hints helped us identify the aggqueue and parsingqueue and the culprits. This answer from Gerald helped us fix it:
http://answers.splunk.com/questions/1142/the-aggqueue-and-parsingqueue-consistently-full-blocked-how...

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...