All Apps and Add-ons

Is there a way to alert when there is "excessive" packet loss with CDR?

rlangermann
New Member

I am trying to think of ways to proactively alert when there are bad video calls in our environment. I see that I can go to Browse -> select the fields I want to report and then create this as a new search and then create alerts based on it. I've added Jitter, Packet Loss, etc. to the search.

What I tried to do was create a

Trigger Conditions -> Custom ->
"numberPacketsLost > 100"

I get a "cannot parse search condition" error. I know that this is probably user error, since I just started using Splunk recently. Does anyone have any tips to facilitate something like this? I was just going to have it email me as a test and then eventually pump it into a Slack channel, or something along those lines.

0 Karma

Richfez
SplunkTrust
SplunkTrust

Sorry for the late answer!

Yes, there is a way to alert when a certain threshold like that is crossed.

For the sake of completeness, I'll start from scratch.

In the CDR app, go to General Report.
Set it to - chart the max, of numberPacketsLost, over gateway.

NOTE you can use max, sum, average, or even 95th percentile. Play with the resulting reports a bit around a time frame you think you should have gotten an alert and see what the resulting graph shows you!

Once you have a graph you like you can proceed - and don't worry that it shows a lot of gateways (or whatever you may have split by) that are very low and wouldn't be alerted on, that's OK for now, we'll trim that out in a bit.

Click the link on the right to "see raw search syntax". That will open up the raw search in a new window and you'll see both results and a search that looks somewhat like

`cdr_and_cmr_events`
| `normalize_qos_fields` 
| stats list(gateway) as gateway list(numberPacketsLost) as numberPacketsLost by globalCallID_callId globalCallID_callManagerId globalCallId_ClusterID 
| search    | chart avg(numberPacketsLost) over gateway | sort 0 avg(numberPacketsLost) desc

Now, in my results I have a gateway X that has far higher dropped packets than any other, the average being almost 3 when all the rest average under 1. (I actually think 95th percentile is normally a better metric to use for something like this, but it's easy enough to go back to General Report and give that a try and repeat these steps with that.)

Let's make some adjustments to that search. We don't need that sort any more, instead we'd like to filter it to things higher than N, which for MY purpose I'll define as 2. Your definition will be different, so be sure to substitute in your value there instead of mine.

So change
| sort 0 avg(numberPacketsLost) desc
to
| where 'avg(numberPacketsLost)' > 2
Now, when that is run the search should return ONLY those rows you'd like to alert on.

Now, click Save As (in the Splunk menu upper right),
Name it whatever, I'll use Alert for high call packet losses
I'm going to set mine to run every hour, but see options on this I'll list below.
The Trigger conditions I'll set to trigger when number of results is greater than 0, once.

I generally recommend adding a "throttle" to limit this from spamming you too hard.
Then add a trigger of an email to YOURSELF because wow you can really accidentally spam someone before you get all the details worked out, and nothing aggravates coworkers more than spammy alert emails they don't need!

A short discussion of schedules. First a warning - Please please please think really really hard about using a real-time schedule - they're terrible nasty performance impacting things, and most importantly 99% of the time or more they're not necessary. Think about it this way - in "real time" (which may be delayed by 5 seconds to a minute or more anyway, and heaven only knows how long your email system may take to pass that message through!), what exactly would anyone do right that instant? If the alert were set to check once every minute or 5 minutes, it's far less load on the system, and the end result is still a near-instant alert.

OK, ok, enough about the warnings. Real-time - just don't do it, K? And if you do, I'm not responsible. 🙂

So, back to schedules. If you wanted to run this over the last 5 minutes each time, on a 5 minute schedule, do this:
Alert type Scheduled,
Run on cron schedule
Select Time range and in that dialog go to advanced and put in Earliest of -5m@m and Latest of @m. (This tells it to go back 5 minutes to the nearest minute, and only look up to the nearest minute boundary - it's not strictly necessary here, but it's a good practice to get into). Click Apply and when you get back to the Save As Alert screen you'll see Splunk has changed that into Last 5 minutes. Isn't it cute?
For the cron expression put in */5 * * * *, which means "for all minutes divisible by 5, on any hour of the day, day of the week, day, month, ..." do this thing. You can google "cron tutorial" or something if you need more help with other versions. Also, "once per minute" would just be * * * * * because cron runs once per minute. 🙂

And I think click Save now?

Sort of lost track, but I think that's it. 🙂

If you'd like to give that a shot, let us know how it goes! Also for questions about the Cisco CDR app, try to tag your question with the "Cisco CDR Reporting and Analytics" app - that way we'll notice it faster and get you an answer more quickly.

Happy Splunking,
Rich

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...