Splunk Search

How to combine my searches to sort search results by a percentage?

renteriaeddie
Engager

Hello.

I am fairly new to the Splunk world and my current job has me monitor various Splunk dashboards throughout the day.

We have a dashboard that actively sorts HTTP status codes across a number of criterion.

One section of this particular dashboard shows HTTP status codes greater than 499 and less than 600 for the purpose of tracking 500 error codes returned to us and it plots it across a bar graph showing the total number of 5XX codes in the span of 1 minute within a 30 minute time range.

The search looks like this:

`| Multisearch [search index="blahblah" source="blahblah" eventName=GetStatusRequestAccepted | spath route | search route="*blah/blah/*" | eval ReportKey="Get Status Request Accepted"] 
 [search index="blahblah" source="blahblahblah" eventName=GetStatusResponseReceived | spath route | search route="*blah/blah/*" | eval ReportKey="Get Status Response Received"] 
 | timechart span=30s count by ReportKey`

A second table will track the number of 5XX error codes in a bar graph, this is also with a 30min time range and this uses 1 minute buckets.

The search looks like this:

`index=blahblah sourcetype=blahblah service="blah  ("message.arg$$1.statusCode">499 AND "message.arg$$1.statusCode"<600) 
| rename message.arg$$1.statusText AS status_text 
| rename message.arg$$1.statusCode AS error_code 
| eval PageError=status_text." - ".error_code'
| timechart span=1m count by PageError`

So, what I am wondering is this, would it be possible to pipe the results of the first search into the second search and have the second search sort out a percentage of events with a 5XX status code?

For example say there was 30 Requests and 30 responses Received but 10 of them were sent to us with a 504 error code, could we have the table show what percentage of the responses had a 5xx error code?

I am not familiar with SPL but I basically want to pipe the output of the first search into the second one and "grep" out the 5XX status codes and have it return a percentage if that makes sense.

Sorry for the total noob question.

0 Karma

niketn
Legend

Looking at your data I expect two types of events GetStatusRequestAccepted and GetStatusResponseReceived. Your response events have status 2XX, 5XX etc.

This explains why you have two separate panels. First panel is Request vs Response to capture whether Response was received. 2nd Panel is for 5XX response code for responses.

Following query should combine Request and Response and calculate the percent of 500 based on total requests received (Ps: Some requests might not have response received).

index="blahblah" source="blahblah" eventName="GetStatusRequestAccepted" OR eventName="GetStatusResponseReceived"
| timechart count(eval(eventname=="GetStatusRequestAccepted")) as Request count(eval(eventname=="GetStatusResponseReceived")) as Response count(eval('message.arg$$1.statusCode'>="500" AND 'message.arg$$1.statusCode'<="600") as Error
| eval Percent=round((Error/Request)*100)

However, Percent can directly be calculated from Response events

index="blahblah" source="blahblah" eventName="GetStatusResponseReceived"
| timechart count(eval(eventname=="GetStatusResponseReceived")) as Total count(eval('message.arg$$1.statusCode'>="500" AND 'message.arg$$1.statusCode'<="600") as Error
| eval Percent=round((Error/Total)*100)

Requesting you to go through HTTP Status codes as well since 1XX is information, 2XX is Successful, 3XX is redirection, 4XX is Error and 5XX is Fault : https://wiki.splunk.com/Http_status.csv

PS: Only if you want to show Request without any response it would make sense to combine both the queries, I think in your case they would be sessionId and transactionId (please verify whether the transactionId remains the same between request and response for the same request).

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Assuming your blahblahs are identical, you can get rid of the multisearch like this...

 index="blahblah" source="blahblah" (eventName=GetStatusRequestAccepted OR eventName=GetStatusResponseReceived ) 
| spath route 
| search route="*blah/blah/*" 
| eval ReportKey=if(eventName=GetStatusRequestAccepted,"Get Status Request Accepted","Get Status Request Accepted")
| timechart span=30s count by ReportKey

Avoid mutisearch, subsearches and joins if you possibly can, because they come with limitations that can quietly break your code.

0 Karma

niketn
Legend

Is your goal to quantify 5XX errors for "Request Accepted" and "Response Received"?
Also if you want % is it percent of 5XX vs all other Status?

Following are the mock indexed and sourcetypes you have used

Request Accepted: index="blahblah" source="blahblah"
Response Received: index="blahblah" source="blahblahblah"
and 5XX errors: index=blahblah sourcetype=blahblah

Could you please confirm whether index is the same for all three. Also whether source is different only for Response Received or not?

Also based on the question seems like Request/Response are either XML or JSON. Can you please confirm whether 5XX errors are part of these events or separate?

Would it be possible for you to dummy/mock the request/response and 5xx events?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

renteriaeddie
Engager

For the first question, yeah, basically. We have a Bar graph that will enumerate the number of 5xx errors.
Second question, it is percent of 5XX vs all other statuses, you are correct. Our current dash just enumerates, but we had some business types wanting a percentage of 5XX events within a given time frame, and short of doing the math manually, our dash didn't provide that.

For the mock indexes, you are correct it is the same for all three, my mistake. I work graveyard and I was looking into making this yesterday before my caffeine kicked in so I wasn't too focused on the names I was mocking.

the 5XX errors would be noted on the Request/Response line chart as a divergence. As to the format I think it is JSON?

Mocked Events :
Request:
{ [-]
agentId: 350ab9fc3a6f4c02b440a6fc6273530d

awsRequestId: 821b2b4c-cbe6-4f2b-98ac-42c109540b78

className: com.company.domain.product.services
clientId: mobile_android
client_ip: x.x.x.x
correlationId: numbers
customerId: numbers
eventName: GetStatusResponseReceived
instanceId: i-aws instance
message: { [-]
arg$1: { [-]
response: { [+]
}

statusCode: 200

statusText: OK

transactionId: numbers
}

}

methodName: logEvent Line Number: 341

route: /api/product/distro/version/purpose

service: product
sessionId: numbers
timestamp: 2017-04-29T09:43:58.912Z

type: INFO

version: 0.9

Response:

{   [-] 

agentId: N/A
awsRequestId: generic reuqest ID number
className: com.company.domain.product.services
clientId: mobile_android
client_ip: x.x.x.x
correlationId: numbers
customerId: numbers
eventName: GetStatusRequestAccepted

instanceId: i-aws instance
message: { [-]
channel: app

correlation: numbers
customerId: numbers
transactionId: numbers
}

methodName: getStatus Line Number: 66

route: /api/product/distro/version/purpose
service: product
sessionId: numbers
timestamp: 2017-04-29T09:41:58.603Z

type: INFO

version: 0.9
}

5XX Error:
{ [-]
agentId: N/A
awsRequestId: generic reuqest ID number
className: com.company.domain.product.services
clientId: client's ID

client_ip: x.x.x.x
correlationId: numbers
customerId: numbers
eventName: GetStatusResponseReceived
instanceId: i-aws instance
message: { [-]
arg$1: { [-]
statusCode: 504

statusText: Gateway Timeout


transactionId: numbers

}

}

methodName: logEvent Line Number: 343

route: /api/product/distro/version/purpose
service: product
sessionId: numbers

timestamp: 2017-04-27T15:44:53.393Z

type: ERROR

version: 0.9
}

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...