Splunk Search

Looking for a License Usage query to generate external scripted alerts with

awurster
Contributor

hi guys i'm looking for help around license usage.

i'm trying to troubleshoot a license violation we had recently where some of our warnings went unnoticed, because they had no context and were running at the incorrect time of day. i want to rewrite our searches now to be more robust / contextual - and the DMC searches either doesn't cut it for us, or i'm not sure how to effectively read / use them.

honestly, license usage in Splunk's a bit of a mess now (IMHO) - some searches are using rest, some "dmc" and others still using "internal" index with license logs... so i'm a bit lost. there's also a lot of time modifiers and joining of searches in the "stock" reports which seem to be overkill or are just confusing me entirely so i can't get what i need.

that said, i got close with the following search, but the calculated eval fields aren't showing up. earliest is -31d@d and latest is @d

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | stats sum(b) AS used max(stacksz) AS quota by _time
  | where used > quota
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | table _time usedGB quotaGB
  | eval percentage=round(usedGB / totalGB, 1) * 100
  | eval usage = usedGB . " (" . percentage . "%)"
  | fields _time usedGB quotaGB usage

i want to have two flavors for my alerts:

  1. a violation "notice" that between 1 and 3 notices have occurred (we can group them on the receiving end)
  2. an outage is about to happen / has happened, with between 4 and 5 violations

both reports should be pretty similar, so i'd like to meet the following requirements for both:

  1. run as close to the "splunk" calculations as possible (should i run this at 12:30 AM, 1 AM, etc?), to avoid a lag or incorrect calculation
  2. include instance name (i.e. search head name)
  3. include pool / stack size
  4. include a usage percentage (i.e used GB / quota GB)
  5. have a count tally over the 30 day period to see how many violations i've had in that rolling period
1 Solution

awurster
Contributor

OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.

1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.

| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active] 
| search is_active=1 
| fields splunk_server, stack_id, used_bytes 
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota] 
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server 
| eval usedGB=round(used_bytes/1024/1024/1024,1) 
| eval totalGB=round(stack_quota/1024/1024/1024,1) 
| eval percentage=round(usedGB / totalGB, 3)*100 
| fields splunk_server, stack_id, percentage, usedGB, totalGB 
| where percentage > 90 
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"

2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | convert timeformat="%F" ctime(_time) AS date
  | stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | eval usedPct = round(usedGB / quotaGB, 1) * 100
  | where usedPct > 60
  | eval violation_id=1
  | eval usage = usedGB . " (" . usedPct . "%)"
  | streamstats global=f sum(violation_id) AS violations
  | fields date stack pool usedGB quotaGB usage violations
  | rename usedGB AS "used", quotaGB AS "quota"

3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.

as for points num 2 and 3... i can add a tail at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:

...
  | fields date stack pool usedGB quotaGB usage violations
  | tail 1
  | rename usedGB AS "used", quotaGB AS "quota"
...

View solution in original post

awurster
Contributor

OK so i've broken this into a few different alerts to get what i need. hopefully as we scale out to using different license pools, this will translate nicely.

1 - a twice hourly warning if at any point in the day, we go above 90% utilisation.

| rest splunk_server_group=dmc_group_license_master /services/licenser/pools
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/groups | search is_active=1 | eval stack_id=stack_ids | fields splunk_server stack_id is_active] 
| search is_active=1 
| fields splunk_server, stack_id, used_bytes 
| join type=outer stack_id splunk_server [rest splunk_server_group=dmc_group_license_master /services/licenser/stacks | eval stack_id=title | eval stack_quota=quota | fields splunk_server stack_id stack_quota] 
| stats sum(used_bytes) as used_bytes max(stack_quota) as stack_quota by splunk_server 
| eval usedGB=round(used_bytes/1024/1024/1024,1) 
| eval totalGB=round(stack_quota/1024/1024/1024,1) 
| eval percentage=round(usedGB / totalGB, 3)*100 
| fields splunk_server, stack_id, percentage, usedGB, totalGB 
| where percentage > 90 
| rename splunk_server AS Instance, percentage AS "License quota used (%)", usedGB AS "License quota used (GB)", totalGB as "Total license quota (GB)"

2 - a once daily warning between 1 and 4 license violations in 30 day average (note search is "all time" because this log is only kept for 30-day average anyhow)

index=_internal source=*license_usage.log type="RolloverSummary"
  | bin _time span=1d
  | convert timeformat="%F" ctime(_time) AS date
  | stats sum(b) AS used max(stacksz) AS quota by date, pool, stack
  | eval usedGB=round(used/1024/1024/1024,3) 
  | eval quotaGB=round(quota/1024/1024/1024,3)
  | eval usedPct = round(usedGB / quotaGB, 1) * 100
  | where usedPct > 60
  | eval violation_id=1
  | eval usage = usedGB . " (" . usedPct . "%)"
  | streamstats global=f sum(violation_id) AS violations
  | fields date stack pool usedGB quotaGB usage violations
  | rename usedGB AS "used", quotaGB AS "quota"

3 - a final alert (slightly different title and severity) for the 4th (and 5th if you get there) violations. same code as above, but different counts.

as for points num 2 and 3... i can add a tail at the end to just grab the last line, so that in my alert system, i see the alerts come in one at a time. like:

...
  | fields date stack pool usedGB quotaGB usage violations
  | tail 1
  | rename usedGB AS "used", quotaGB AS "quota"
...
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...