Splunk Search

Correlate STUCK and UNSTUCK weblogic threads

dbcase
Motivator

Hi,

I have the below events. What I need to do is correlate the execute thread (the 2nd one) with a STUCK message. That part is easy enough, where I get stumped is now correlating further with a unstuck message. What I'm hoping to have at the end is a table that shows the URL (in this example it is GET /rest/icontrol/sites/72178/rules HTTP/1.1 then a column with the word STUCK then another column with the word UNSTUCK or blank.

My effort so far is:

index=cox STUCK|rex "GET\s(?<URL>\S+)"|rex "\[STUCK] ExecuteThread:\s'(?<threadID>\S+)[']"|dedup threadID host|stats count by URL host threadID|sort host threadID

This just extracts out the URL and threadID from Stuck threads and does a simple table. I'm stuck on the matching up the correlating UNSTUCK message

Event data:

1/23/17
11:02:50.000 PM 
####<Jan 23, 2017 11:02:50 PM EST> <Info> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230570155> <BEA-000339> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)' has become "unstuck".> 
host =  portal2 index = linecount = 1 source =  /var/nfs/SAT_SplunkLogs/weblogic/portal2/Portal2_managedServer12.log00477.zip:./managedServer12.log00477 sourcetype =   wls_managedserver splunk_server =   idx6.icontrol.splunkcloud.com
1/23/17
11:02:40.000 PM 
####<Jan 23, 2017 11:02:40 PM EST> <Info> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230560319> <BEA-000339> <[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel.Default (self-tuning)' has become "unstuck".> 
host =  portal2 index = linecount = 1 source =  /var/nfs/SAT_SplunkLogs/weblogic/portal2/Portal2_managedServer12.log00477.zip:./managedServer12.log00477 sourcetype =   wls_managedserver splunk_server =   idx6.icontrol.splunkcloud.com
1/23/17
11:01:58.000 PM 
####<Jan 23, 2017 11:01:58 PM EST> <Error> <WebLogicServer> <ccivirpxa0721> <managedServer12> <[ACTIVE] ExecuteThread: '26' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1485230518746> <BEA-000337> <[STUCK] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "652" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 652073 ms
[
GET /rest/icontrol/sites/72178/rules HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.0; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
X-format: json
X-ClientInfo: 7.3.8.77
Referer: https://portal.company.com/sp/
Cookie: JSESSIONID=F9XOmTyCrkis7WDdsku5tZO09U0_te9pCpjfHxhsAIqM30KIB53j!-695380872
Via: 1.1 10.210.192.38
X-Forwarded-For: 10.210.192.5
X-Forwarded-Host: portal.company.com
X-Forwarded-Server: 10.210.192.38
Connection: Keep-Alive
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
<snip>
0 Karma
1 Solution

lguinn2
Legend

Try this

index=cox "stuck" OR "unstuck"
| rex "GET\s(?<URL>\S+)"
| rex "\[(?<threadStatus>[STUCK|ACTIVE])\] ExecuteThread:\s'(?<threadID>\S+)[']"
| eval timestamp=strftime(_time."%x %X")
| eval threadStatus=if(threadStatus=="ACTIVE","Unstuck",threadStatus)
| sort _time
| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID

It may not be the format that you asked for, but I think it will work!

View solution in original post

0 Karma

lguinn2
Legend

Try this

index=cox "stuck" OR "unstuck"
| rex "GET\s(?<URL>\S+)"
| rex "\[(?<threadStatus>[STUCK|ACTIVE])\] ExecuteThread:\s'(?<threadID>\S+)[']"
| eval timestamp=strftime(_time."%x %X")
| eval threadStatus=if(threadStatus=="ACTIVE","Unstuck",threadStatus)
| sort _time
| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID

It may not be the format that you asked for, but I think it will work!

0 Karma

dbcase
Motivator

HI Iguinn,

Hmmm I think you have made it MUCH closer but the rex for threadStatus isn't working

0 Karma

dbcase
Motivator

Found it this "\[(?<threadStatus>[STUCK|ACTIVE])\] needed to be modified to this "\[(?<threadStatus>[STUCK|ACTIVE]+)\]

Now sorting thru the results, will let you know if there are any other modifications....

THANKS!!!

0 Karma

lguinn2
Legend

good catch on the regular expression!

0 Karma

dbcase
Motivator

Hi Iguinn,

Based upon your query (thank you!) I modified it to get what I was hoping for. The end result looks like this

index=cox stuck OR unstuck  | rex "GET\s(?<URL>\S+)"  | rex "(?<threadStatus>(STUCK|unstuck))"| rex "(?:.*?ExecuteThread:\s'){2}(?<threadID>\S+)[']"  | eval timestamp=strftime(_time,"%x %X")| sort _time| dedup threadID host _time| stats list(timestamp) as Time list(threadStatus) as "Thread Status" by host threadID|sort host threadID
0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...