I need to combine two events together as transaction:
1) request event has 123
2) response event has 345123
I'd like to use the combine these two events into a transaction using id: 123
How to do that?
@xiaoyunwuxie, following is a run anywhere search based on the sample data provided which uses rex to extract the type
of event as Request or Response and the message id as msgId
from each of them. It then uses stats to correlate them and find the duration between request and response.
PS: Commands from makeresults
to search
are used to generate and isolate sample request/response data. You would need to use yourbase search instead i.e. index="yourIndexName" sourcetype="yourSourceType" "<ABCRequest><msgId>" OR "<ABCResponse><refMsgId>"
| makeresults
| eval _raw="<ABCRequest><msgId>123</msgId></ABCRequest>", _time=strptime("2018/03/06 09:00:00","%Y/%m/%d %H:%M:%S")
| append [| makeresults
| eval _raw="<ABCResponse><refMsgId>123</refMsgId><msgId>345</msgId></ABCResponse>", _time=strptime("2018/03/06 10:00:00","%Y/%m/%d %H:%M:%S")]
| search "<ABCRequest><msgId>" OR "<ABCResponse><refMsgId>"
| rex "\<ABC(?<type>[^\>]+)\>\<(msgId|refMsgId)\>(?<msgId>[^\<]+)\<\/(msgId|refMsgId)\>"
| stats count as eventcount first(_time) as EarliestTime last(_time) as LatestTime values(type) as type
| search type="Request" AND type="Response"
| eval duration=LatestTime-EarliestTime
| eval _time=EarliestTime
@xiaoyunwuxie, following is a run anywhere search based on the sample data provided which uses rex to extract the type
of event as Request or Response and the message id as msgId
from each of them. It then uses stats to correlate them and find the duration between request and response.
PS: Commands from makeresults
to search
are used to generate and isolate sample request/response data. You would need to use yourbase search instead i.e. index="yourIndexName" sourcetype="yourSourceType" "<ABCRequest><msgId>" OR "<ABCResponse><refMsgId>"
| makeresults
| eval _raw="<ABCRequest><msgId>123</msgId></ABCRequest>", _time=strptime("2018/03/06 09:00:00","%Y/%m/%d %H:%M:%S")
| append [| makeresults
| eval _raw="<ABCResponse><refMsgId>123</refMsgId><msgId>345</msgId></ABCResponse>", _time=strptime("2018/03/06 10:00:00","%Y/%m/%d %H:%M:%S")]
| search "<ABCRequest><msgId>" OR "<ABCResponse><refMsgId>"
| rex "\<ABC(?<type>[^\>]+)\>\<(msgId|refMsgId)\>(?<msgId>[^\<]+)\<\/(msgId|refMsgId)\>"
| stats count as eventcount first(_time) as EarliestTime last(_time) as LatestTime values(type) as type
| search type="Request" AND type="Response"
| eval duration=LatestTime-EarliestTime
| eval _time=EarliestTime
Thank you, niketnilay! In my case, I only needed to know, which request does not have response. so after transaction statement, I used search closed_txn=0.
You did bring a interesting point, what if I want to know how much time elapsed between each request and response and figure out the counts of those whose elapsed time (from request to response) is over 10 seconds.
@niketnilay, Thank you for the input. Sorry for the delayed response. I tried your solution and it seems like it's calculating all the request and response as one event. I have tons of this kinds of request and response. and I need to calculate the duration for each one. So I used part of yours. Thank you very much!
| transaction msgId startswith="ABCRequest" endswith="ABCResponse" keepevicted=true | search closed_txn=1
| eval SLA = case(duration<=0.15,"ok",duration>0.15 AND duration<=0.25,"warning",duration>0.25 AND duration<0.30,"critical",true(),"breached")
| chart count by SLA
@xiaoyunwuxie, sorry I had missed msgId
as key in stats i.e. by msgId
It should work faster than transaction as stated earlier. Please try the following:
<yourBaseSearch> "<ABCRequest>" OR "<ABCResponse>"
| stats count as eventcount first(_time) as EarliestTime last(_time) as LatestTime values(type) as type by msgId
| search type="Request" AND type!="Response"
| eval duration=EarliestTime-now()
Please try this and confirm!
@niketnilay, wonderful, this is perfect. Thank you so much! Yes, it's much more powerful! 🙂
Awesome!!! Dont forget to upvote comments that helped 🙂
The stats command can be used for both the use cases you have mentioned.
If you want requests without response you can use the following search after the stats
....
| stats count as eventcount first(_time) as EarliestTime last(_time) as LatestTime values(type) as type
| search type="Request" AND type!="Response"
| eval duration=EarliestTime-now()
If you want to get the SLA of duration between request and response you can try the following (I have created some sample SLA based on duration in seconds, you can use your own):
...
| stats count as eventcount first(_time) as EarliestTime last(_time) as LatestTime values(type) as type
| search type="Request" AND type="Response"
| eval duration=LatestTime-EarliestTime
| eval SLA = case(duration<=10,"ok",duration>10 AND duration<=30,"warning",duration>30 AND duration<60,"critical",true(),"breached")
Thank you so much for the idea! That's very helpful. I eventually used: rex "<(msgId|refMsgId)>(?[^<]+)<\/(msgId|refMsgId)>" | transaction msgId
@xiaoyuwuxie, you should evaluate what you need to do once you have correlated the events? stats can perform things that transaction can also do but may perform much beter. Refer to documentation on event correlation on Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Search/Abouteventcorrelation
are the "123" and "345123" values of the same field?
can you share some sample events?
sorry, the tag were stripped away.
1) request event: <ABCRequest><msgId>123</msgId></ABCRequest>
2) response event: <ABCResponse><refMsgId>123</refMsgId><msgId>345</msgId></ABCResponse>
msgId and refMsgId is the same, here is 123, I'd like to use 123 to combine them into one transaction.