I have 5 basic SOAP web services that get logged by splunk which have 5 different names (Example: 'DeliveryScheduleRequest'). I did a field extraction for those web services which works successfully. I then set an alert which sends an email anytime one of these web services has a response time longer than 5 seconds.
Now I would like to have that extracted field to show in the subject line of the email anytime an alert happens. How would I do that?
I was finally able to figure it out. I'll post my findings for others to see in the future
I had an extracted field called 'TestCall' which represented the web service calls in my events. The reason it wasn't working was because I did not have 'TestCall' in my search query, it was only an extracted field. Below is my query which works as expected
index=uvtrans | transaction GUID startswith="*request" endswith="*response" | where duration>5 AND isnotnull(TestCall)
I was finally able to figure it out. I'll post my findings for others to see in the future
I had an extracted field called 'TestCall' which represented the web service calls in my events. The reason it wasn't working was because I did not have 'TestCall' in my search query, it was only an extracted field. Below is my query which works as expected
index=uvtrans | transaction GUID startswith="*request" endswith="*response" | where duration>5 AND isnotnull(TestCall)
If your search results have a field web_service
you can use its value in email alerts by using the token $result.web_service$
. See http://docs.splunk.com/Documentation/Splunk/6.2.2/Alert/Setupalertactions#Use_tokens_in_email_notifi... for reference. Note, this may not exist for you if you're using fairly oldish Splunk versions.
Now I'm thoroughly confused... if the email tokens part of your issue is solved, you should probably close this question and open a new one with your transaction issue along with some sample data so people can reproduce the issue easily.
Using that very token over here works well, and it seems your alert definition is correct as well. What values does your search return for the field?
I discovered the mistake yesterday and you also nailed it on the head... It's an issue with my search query. I did not include TestCall="*" in my search. So when alert does not see TestCall because its not in the search. I thought that by defining it as an extracted field was enough, but I gotta put it in my search query.
Now my other issue is when I put TestCall="*", a few things happen..
The unique identifier "GUID" is attached to the request and response. This allows me to pipe it into TRANSACTION and group them together and measure the response time for each web service call. So there were 10 events which were grouped together that had a response time longer than 5 seconds today
Query 1:
index=uvtrans | transaction GUID startswith="*request" endswith="*response" | WHERE duration>5
This returns back 10 events which had response times greater than 5 seconds.
Query 2:
index=uvtrans TestCall="*" | transaction GUID startswith="*request" endswith="*response"
This returns back results but the duration is not longer than 5 seconds like the original search.
Query 3:
index=uvtrans TestCall"*" | transaction GUID startswith="*request" endswith="*response" | WHERE duration>5
Now introducing the WHERE duration>5 returns back 0 results
Something more general about your search, you will miss a lot of long-running calls.
First, you're running the search every minute over -1m
to now
. Assuming your data has on average two seconds of latency and clock inaccuracy, you will on average miss two seconds of data for every execution.
Second, you're using transaction
to merge large calls together. Say a call starts at 01:02:55 and ends at 01:03:05, it'd be ten seconds long and alert-able. However, your search running at 01:03:00 will only see a start and your search running at 01:04:00 will only see an end.
Here's an alternative suggestion to be scheduled every minute with a time range of -2m@m
to @m
:
index=uvtrans | transaction GUID startswith="*request" endswith="*response" | WHERE duration>5 AND _time < relative_time(now(), "-1m")
That will search each event twice, but only return events that started in the first minute of the two-minute time range. Hence an event starting in the first minute and ending in the second will be covered.
Two assumptions are needed for the timing of this, first that you have low-ish latency and that calls don't take over a minute minus latency. If one of the assumption doesn't hold for you then you'll need to extend the time range or increase the offset into the past.
Thanks for the suggestion, I applied to my alert. Any ideas of how to get this token working so I can see which web service is running slow in the email subject line?
You could call a script, sure... Bash or Python work pretty well out of the box.
Alternatively, you could post your savedsearches.conf
entry for this search so we can see where the issue is and get email working like it does for thousands of others.
Here's my entry from savedsearches.conf. Thanks for the help so far
[Response Time > 5 Sec]
action.email = 1
action.email.include.trigger = 1
action.email.reportServerEnabled = 0
action.email.subject.alert = Splunk Alert: $result.Call5$ Resp > 5 Sec
action.email.to = DOTCOM_PERFORMANCE_MONITORING_ALERTS@xxxxxxx.com
action.email.useNSSubject = 1
alert.digest_mode = 0
alert.suppress = 0
alert.track = 1
counttype = number of events
cron_schedule = * * * * *
description = This alert goes off when a web service call takes longer than 5 seconds
dispatch.earliest_time = -1m
dispatch.latest_time = now
display.events.fields = ["host","TYPE","CLASSPATH","splunk_server","Status","City","Code","GUID","req","resp","duration","CLASS","linecount","GUID1"]
display.page.search.mode = verbose
display.visualizations.chartHeight = 279
display.visualizations.charting.chart = line
enableSched = 1
quantity = 0
relation = greater than
request.ui_dispatch_app = search
request.ui_dispatch_view = search
search = index=uvtrans | transaction GUID startswith="*request" endswith="*response" | WHERE duration>5
I went ahead and did another field extraction to replace Call5 to make sure that wasn't the issue. So I have a new field extraction called 'TestCall'.
I did a search for all web service calls that were greater than 5 seconds then did a field extraction for all the request names (TestCall). I only had about 10 calls greater than 5 seconds and my TestCall extraction picked up all the request names as expected.
I then changed my alert to 'Splunk Alert: $result.TestCall$ Resp > 5 Sec'
So now I have to wait for a request that takes longer than 5 seconds to see if this works.. I also included my savedsearches.conf entry above. Please advise on what else I should try if this doesn't work
What version of Splunk are you on?
Also, my bad - it is $result.Call5$
without the s
... http://docs.splunk.com/Documentation/Splunk/6.2.2/Alert/Setupalertactions#Use_tokens_in_email_notifi...
I'm running 6.2
I'm pretty sure I tried both $result.Call5$ and $results.Call5$ with no luck. Looks like my only option left is to call an external script to reference which web service is running slow. Do you know if its possible for me to use Javascript to do so?
My extracted field is named 'Call5'. I went into my alert and put the email subject as
Splunk Alert: $results.Call5$ Response > 5 Seconds
Now the only thing appearing in the subject line is Splunk Alert: