All Apps and Add-ons

SoS 3.0 nfsiostat

rmorlen
Splunk Employee
Splunk Employee

I see a new input in SoS 3.0:

NEW DATA INPUT! - Scripted input 'nfs-iostat_sos.py' is now available to monitor the I/O usage of pooled search-heads on the shared NFS device.

I don't see any dashboards that use the data collected by the script. Any plans for this?

1 Solution

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

View solution in original post

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

rmorlen
Splunk Employee
Splunk Employee

Or I guess my real question is what is the best way to track this over a long period of time (like 90 days) so that we can determine if things are similar today as 90 days ago. Something like: index=sos source="nfs-iostat_sos.py" op_type=getattr earliest=-90d | timechart span=1d avg(kBps) by host

0 Karma

rmorlen
Splunk Employee
Splunk Employee

Good information. Thank you. Now what is considered good/normal vs bad? GETATTR skews the information. Your first query has 1.9M for GETATTR for sum(op_count) vs 190K for LOOKUP (which is the next highest).

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...