Splunk Search

Correlate STUCK and UNSTUCK weblogic threads

dbcase
Motivator

Hi,

I have the below events. What I need to do is correlate the execute thread (the 2nd one) with a STUCK message. That part is easy enough, where I get stumped is now correlating further with a unstuck message. What I'm hoping to have at the end is a table that shows the URL (in this example it is GET /rest/icontrol/sites/72178/rules HTTP/1.1 then a column with the word STUCK then another column with the word UNSTUCK or blank.

My effort so far is:

index=cox STUCK|rex "GET\s(?<URL>\S+)"|rex "\[STUCK] ExecuteThread:\s'(?<threadID>\S+)[']"|dedup threadID host|stats count by URL host threadID|sort host threadID

This just extracts out the URL and threadID from Stuck threads and does a simple table. I'm stuck on the matching up the correlating UNSTUCK message

Event data:

1/23/17
11:02:50.000 PM 
####<Jan 23, 2017 11:02:50 PM EST> <Info> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230570155> <BEA-000339> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)' has become "unstuck".> 
host =  portal2 index = linecount = 1 source =  /var/nfs/SAT_SplunkLogs/weblogic/portal2/Portal2_managedServer12.log00477.zip:./managedServer12.log00477 sourcetype =   wls_managedserver splunk_server =   idx6.icontrol.splunkcloud.com
1/23/17
11:02:40.000 PM 
####<Jan 23, 2017 11:02:40 PM EST> <Info> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230560319> <BEA-000339> <[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel.Default (self-tuning)' has become "unstuck".> 
host =  portal2 index = linecount = 1 source =  /var/nfs/SAT_SplunkLogs/weblogic/portal2/Portal2_managedServer12.log00477.zip:./managedServer12.log00477 sourcetype =   wls_managedserver splunk_server =   idx6.icontrol.splunkcloud.com
1/23/17
11:01:58.000 PM 
####<Jan 23, 2017 11:01:58 PM EST> <Error> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230518746> <BEA-000337> <[STUCK] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "652" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 652073 ms
[
GET /rest/icontrol/sites/72178/rules HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.0; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
X-format: json
X-ClientInfo: 7.3.8.77
Referer: https://portal.company.com/sp/
Cookie: JSESSIONID=F9XOmTyCrkis7WDdsku5tZO09U0_te9pCpjfHxhsAIqM30KIB53j!-695380872
Via: 1.1 10.210.192.38
X-Forwarded-For: 10.210.192.5
X-Forwarded-Host: portal.company.com
X-Forwarded-Server: 10.210.192.38
Connection: Keep-Alive
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
<snip>
0 Karma
1 Solution

lguinn2
Legend

Try this

index=cox "stuck" OR "unstuck"
| rex "GET\s(?<URL>\S+)"
| rex "\[(?<threadStatus>[STUCK|ACTIVE])\] ExecuteThread:\s'(?<threadID>\S+)[']"
| eval timestamp=strftime(_time."%x %X")
| eval threadStatus=if(threadStatus=="ACTIVE","Unstuck",threadStatus)
| sort _time
| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID

It may not be the format that you asked for, but I think it will work!

View solution in original post

0 Karma

lguinn2
Legend

Try this

index=cox "stuck" OR "unstuck"
| rex "GET\s(?<URL>\S+)"
| rex "\[(?<threadStatus>[STUCK|ACTIVE])\] ExecuteThread:\s'(?<threadID>\S+)[']"
| eval timestamp=strftime(_time."%x %X")
| eval threadStatus=if(threadStatus=="ACTIVE","Unstuck",threadStatus)
| sort _time
| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID

It may not be the format that you asked for, but I think it will work!

0 Karma

dbcase
Motivator

HI Iguinn,

Hmmm I think you have made it MUCH closer but the rex for threadStatus isn't working

0 Karma

dbcase
Motivator

Found it this "\[(?<threadStatus>[STUCK|ACTIVE])\] needed to be modified to this "\[(?<threadStatus>[STUCK|ACTIVE]+)\]

Now sorting thru the results, will let you know if there are any other modifications....

THANKS!!!

0 Karma

lguinn2
Legend

good catch on the regular expression!

0 Karma

dbcase
Motivator

Hi Iguinn,

Based upon your query (thank you!) I modified it to get what I was hoping for. The end result looks like this

index=cox stuck OR unstuck  | rex "GET\s(?<URL>\S+)"  | rex "(?<threadStatus>(STUCK|unstuck))"| rex "(?:.*?ExecuteThread:\s'){2}(?<threadID>\S+)[']"  | eval timestamp=strftime(_time,"%x %X")| sort _time| dedup threadID host _time| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID|sort host threadID
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...