Getting Data In

Need to schedule IO wait alerts on Splunk

vikram_m
Path Finder

Our Splunk infrastructure is on Azure and recently we face a major issue where I/O wait time was high and so indexing and all the data pipeline queues were effected.

Now we have decided as an RCA item to schedule the IO wait time alerts on the infrastructure so that we can know if there something wrong in our splunk config or it is an Azure storage which is piling up the data pipeline queues.

Please let us know now can we schedule IO alerts on Splunk.

Thanks.
Vikram.

0 Karma

adonio
Ultra Champion

hello there, i might be off with my answer but thought its worthwhile to bring to your attention and need the answer to post screenshots.
you can use the DMC (or MC), it has pre-built alerts on indexing queues and indexers performance, screenshot 1
also you can navigate on DMC to Resource Usage: Machine -> scroll down to see the I/O graph.
open that in search will show you the following:

 `dmc_set_index_introspection` sourcetype=splunk_resource_usage component=IOStats host=<yourHost>
              | eval mount_point = 'data.mount_point'
              | eval reads_ps = 'data.reads_ps'
              | eval writes_ps = 'data.writes_ps'
              | eval interval = 'data.interval'
              | eval op_count = (reads_ps + writes_ps) * interval
              | eval avg_service_ms = 'data.avg_service_ms'
              | eval avg_wait_ms = 'data.avg_total_ms'
              | eval cpu_pct = 'data.cpu_pct'
              | eval network_pct = 'data.network_pct' | `dmc_timechart_for_iostats` per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct
                | eval iops = round(iops)
                | eval avg_cpu_pct = round(avg_cpu_pct)
                | eval avg_service_ms = round(avg_service_ms)
                | eval avg_wait_ms = round(avg_wait_ms)
                | eval avg_network_pct = round(avg_network_pct)
                | fields _time, iops avg_wait_ms
                | rename avg_wait_ms as "Wait Time"

which you can modify and use as a base to your alerts
hope it helps

screenshot 1:
alt text

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...