Splunk Search

Help Needed with Real Time Query for SLA violations for each API in sourcetype or alternative approaches if not possible

iTechEvent
Explorer

The use case am working on:

  • I have one sourcetype, one index.
  • In the event log there are several apis with responsetime fields.
  • There is SLA values for responsetime for each API. This is not in the event log as field, but available as values to use in query.
  • For each of 20 API's in the event log, I need to compute the fraction of violations when response time is greater than SLA given a time range for historic search say past 15 min.
  • The query needs to determine the count of API's during this time range, number of SLA violations for API during this time range and calculate the fraction of violations.
  • The SLA value is not same for across API's. Each API has its own SLA value to compute violations
  • The time range is the same across all API's.

Here is a historic search which could do this easily using append for the past 15 min which I am trying to convert to realtime search.

index=i1 sourcetype=s1 uri_path="api1" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=1000 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path | append [ search index=i1 sourcetype=s1 uri_path="api2" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=750 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path ]

But I need to do this in realtime. Append doesnt work for this. I also believe there could be some issues with using multiple real time queries for real time searches like I used to do for historic searches which am not fully sure.

Here is one which I tried.

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "\w{8}-\w{4}-\w{4}-\w{4}-\w{12}", "{id}") | eval SLA=750 | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path

The issue is SLA value is not the same for all API's. Its different for each API.

Perhaps there is limitations with using multiple queries in real time. Perhaps a single query should do this when converting the above use case from historic to real time search. Since append doesnt work, am not sure if map, join etc can work also because they involve 2 queries in conjunction.

I need help with this use case for real time searches in terms of writing splunk query for it.

In case there are limitations for this use case, an alternative way is to use scheduled historic search every minute. It should also be that the query should run fast and finish quickly within the minute and query performance acceleration could be a consideration.

Any suggestions for this as well?

Any help will be appreciated.

0 Karma
1 Solution

somesoni2
Revered Legend

The best approach (one which I am using as well) will be to create a lookup table file for the SLA value and then reference it in your query using lookup command.

Lookup table: api_sla.csv
Lookup Fields: api_name, sla

Updated (sample query, assuming api name=uri_path, if not use the field which contains api name in the lookup command):

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "w{8}-w{4}-w{4}-w{4}-w{12}", "{id}") | lookup api_sla.csv api_name as uri_path OUTPUT sla as SLA | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path 

View solution in original post

somesoni2
Revered Legend

The best approach (one which I am using as well) will be to create a lookup table file for the SLA value and then reference it in your query using lookup command.

Lookup table: api_sla.csv
Lookup Fields: api_name, sla

Updated (sample query, assuming api name=uri_path, if not use the field which contains api name in the lookup command):

index=i1 sourcetype=s1 uri_path="api1" OR uri_path="api2" | eval uri_path=replace(uri_path, "w{8}-w{4}-w{4}-w{4}-w{12}", "{id}") | lookup api_sla.csv api_name as uri_path OUTPUT sla as SLA | stats count as count, count(eval(responsetime>SLA)) as violations,first(SLA) as SLA by uri_path 

iTechEvent
Explorer

http://docs.splunk.com/Documentation/Splunk/6.0.2/SearchReference/Streamstats

It looks like streamstats is more appropriate from streaming perspective. How to explain this to unfamiliar audience. What about the choice of using stats over streamstats. It looks like streamstats is more appropriate for moving average of last 5 events in the entire collection of events in the real time window whereas stats works on the entire collection of events in the real time window.Are we loosing anything by not using streamstats when real time streaming is used?

0 Karma

somesoni2
Revered Legend

Great. please accept the answer if there are no followup questions.

0 Karma

iTechEvent
Explorer

Thanks. Works as expected !

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...