I want to create an alert which will find requests which have not received a response.
I have created the following search which will find the the requests which have no responses. The request has an id called transaction_id and the response has the same identifier called original_transaction_id.
eval tid = coalesce(transaction_id, original_transaction_id)
| tid maxpause=2m startswith="request" endswith="response" keepevicted=true
| where evicted=1
This works as a search however as an alert I'm currently getting false alerts as the response may take up to 10 seconds to be received from when the request has been posted. So the alert needs to take that into account and ignore any requests which are younger than 10 seconds old.
Can someone please help me to add the time restriction to the "request" event to prevent the false alerts
@stewartevans, for your use case you would be better off using stats instead of transaction for correlation. Refer to About event grouping and correlation documentation. This should give you more control over your correlation using search filter as per your need after the stats command. Also stats would perform better for longer duration/more events as compared to transaction.
Following is a run anywhere example with some sample data as based on points described in the question. It generates some events with Request and Response i.e. for transaction id 1234
, 4567
. But no response for 8910
. For testing purpose it also add a request 1112
, which has _time set as 9 sec
before current time. While testing you can change the | eval _time=_time-9
pipe to 10
, 11
etc test out less than, equal to and greater than 10 sec scenarios.
1. The searchmatch()
evaluation function has been used to create request
and response
type
fields for corresponding events, as the same does not seem to be present in your data as per transaction query you have run.
2. The stats command groups type
together for each transaction id i.e. tid
using list()
function.
3. Although for the following use case you dont need this but as per the type for data mvindex(type,0)
should give startswith
condition i.e. request
and mvindex(type,1)
should give endswith
condition i.e. response
.
4. In your case you are interested in events where request exist but there is not response
i.e. | search type="request" AND type!="response"
5. Further, such events will have same earliestTime and latestTime as there is no response. now()-earliestTime
has been used to get the time duration between request received and current time. So that we can filter only request received older than 10 seconds.
| makeresults
| eval data="Time=\"2018/07/31 01:00:00\" some request transaction_id=1234;Time=\"2018/07/31 01:00:10\" some response original_transaction_id=1234;Time=\"2018/07/31 01:10:00\" some request transaction_id=4567;Time=\"2018/07/31 01:10:20\" some response original_transaction_id=4567;Time=\"2018/07/31 02:00:00\" some request transaction_id=8910;"
| makemv data delim=";"
| mvexpand data
| rename data as _raw
| KV
| eval _time=strptime(Time,"%Y/%m/%d %H:%M:%S")
| fields - Time
| append [| makeresults
| eval _time=_time-9
| eval _raw="some request transaction_id=1112"
| KV]
| eval tid=coalesce(transaction_id, original_transaction_id)
| eval type=case(searchmatch("request"),"request",searchmatch("response"),"response",true(),"N/A")
| stats list(type) as type min(_time) as earliestTime max(_time) as latestTime by tid
| search type="request" AND type!="response"
| eval requestTimeDuration=now()-earliestTime
| where requestTimeDuration>10
| fieldformat earliestTime=strftime(earliestTime,"%Y/%m/%d %H:%M:%S")
| fieldformat latestTime=strftime(latestTime,"%Y/%m/%d %H:%M:%S")
PS: fieldformat has been applied to convert epoch time to human readable string time format for earliest and latest time.
If you need events where response was received
however, duration took longer than 10 seconds
you search filter can be changed to the following:
| search type="request" AND type="response"`
| eval duratino=latestTime-earliestTime
| where duration>10
@stewartevans, for your use case you would be better off using stats instead of transaction for correlation. Refer to About event grouping and correlation documentation. This should give you more control over your correlation using search filter as per your need after the stats command. Also stats would perform better for longer duration/more events as compared to transaction.
Following is a run anywhere example with some sample data as based on points described in the question. It generates some events with Request and Response i.e. for transaction id 1234
, 4567
. But no response for 8910
. For testing purpose it also add a request 1112
, which has _time set as 9 sec
before current time. While testing you can change the | eval _time=_time-9
pipe to 10
, 11
etc test out less than, equal to and greater than 10 sec scenarios.
1. The searchmatch()
evaluation function has been used to create request
and response
type
fields for corresponding events, as the same does not seem to be present in your data as per transaction query you have run.
2. The stats command groups type
together for each transaction id i.e. tid
using list()
function.
3. Although for the following use case you dont need this but as per the type for data mvindex(type,0)
should give startswith
condition i.e. request
and mvindex(type,1)
should give endswith
condition i.e. response
.
4. In your case you are interested in events where request exist but there is not response
i.e. | search type="request" AND type!="response"
5. Further, such events will have same earliestTime and latestTime as there is no response. now()-earliestTime
has been used to get the time duration between request received and current time. So that we can filter only request received older than 10 seconds.
| makeresults
| eval data="Time=\"2018/07/31 01:00:00\" some request transaction_id=1234;Time=\"2018/07/31 01:00:10\" some response original_transaction_id=1234;Time=\"2018/07/31 01:10:00\" some request transaction_id=4567;Time=\"2018/07/31 01:10:20\" some response original_transaction_id=4567;Time=\"2018/07/31 02:00:00\" some request transaction_id=8910;"
| makemv data delim=";"
| mvexpand data
| rename data as _raw
| KV
| eval _time=strptime(Time,"%Y/%m/%d %H:%M:%S")
| fields - Time
| append [| makeresults
| eval _time=_time-9
| eval _raw="some request transaction_id=1112"
| KV]
| eval tid=coalesce(transaction_id, original_transaction_id)
| eval type=case(searchmatch("request"),"request",searchmatch("response"),"response",true(),"N/A")
| stats list(type) as type min(_time) as earliestTime max(_time) as latestTime by tid
| search type="request" AND type!="response"
| eval requestTimeDuration=now()-earliestTime
| where requestTimeDuration>10
| fieldformat earliestTime=strftime(earliestTime,"%Y/%m/%d %H:%M:%S")
| fieldformat latestTime=strftime(latestTime,"%Y/%m/%d %H:%M:%S")
PS: fieldformat has been applied to convert epoch time to human readable string time format for earliest and latest time.
If you need events where response was received
however, duration took longer than 10 seconds
you search filter can be changed to the following:
| search type="request" AND type="response"`
| eval duratino=latestTime-earliestTime
| where duration>10
This is brilliant @niketnilay I've just tested out your recommendation and it appears to work perfectly. I also learnt a lot about stats and sample data generation at the same time. Thank you very much!
@stewartevans I am glad you found it useful. I have learnt these things by hanging out here on Splunk Answers 🙂 Now you need to "pass on" the knowledge.
The link that I provided is by Nick Mealy's and his flowchart for deciding event grouping and correlation is epic 🙂 There are more commands that have been introduced like union in Splunk 6.6 and previously undocumented gem multisearch. They would eventually be documented in above flowchart as well.