All Apps and Add-ons

SoS 3.0 nfsiostat

rmorlen
Splunk Employee
Splunk Employee

I see a new input in SoS 3.0:

NEW DATA INPUT! - Scripted input 'nfs-iostat_sos.py' is now available to monitor the I/O usage of pooled search-heads on the shared NFS device.

I don't see any dashboards that use the data collected by the script. Any plans for this?

1 Solution

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

View solution in original post

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

rmorlen
Splunk Employee
Splunk Employee

Or I guess my real question is what is the best way to track this over a long period of time (like 90 days) so that we can determine if things are similar today as 90 days ago. Something like: index=sos source="nfs-iostat_sos.py" op_type=getattr earliest=-90d | timechart span=1d avg(kBps) by host

0 Karma

rmorlen
Splunk Employee
Splunk Employee

Good information. Thank you. Now what is considered good/normal vs bad? GETATTR skews the information. Your first query has 1.9M for GETATTR for sum(op_count) vs 190K for LOOKUP (which is the next highest).

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...