All Apps and Add-ons

SoS 3.0 nfsiostat

rmorlen
Splunk Employee
Splunk Employee

I see a new input in SoS 3.0:

NEW DATA INPUT! - Scripted input 'nfs-iostat_sos.py' is now available to monitor the I/O usage of pooled search-heads on the shared NFS device.

I don't see any dashboards that use the data collected by the script. Any plans for this?

1 Solution

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

View solution in original post

hexx
Splunk Employee
Splunk Employee

Absolutely! In a future release (hopefully, the next one), we plan to ship a view using the events generated by this scripted input to show the I/O bandwidth usage and the performance and responsiveness of the shared NFS device in a search-head pool.

In the meantime, you can always run manual searches against the data collected by nfs-iostat_sos.py to that end.

Here's a couple of simple examples:

  • Median IOPS and total OP count per OP type for the past 5 minutes:


    index=sos source="nfs-iostat_sos.py" earliest=-5m
    | stats sum(op_count) median(ops_per_sec) by op_type

  • Worst-case round-trip time (RTT) for GETATTR, LOOKUP, ACCESS calls:


    index=sos source="nfs-iostat_sos.py" (op_type=GETATTR OR op_type=LOOKUP OR op_type=ACCESS)
    | timechart max(rtt_per_op) by op_type

rmorlen
Splunk Employee
Splunk Employee

Or I guess my real question is what is the best way to track this over a long period of time (like 90 days) so that we can determine if things are similar today as 90 days ago. Something like: index=sos source="nfs-iostat_sos.py" op_type=getattr earliest=-90d | timechart span=1d avg(kBps) by host

0 Karma

rmorlen
Splunk Employee
Splunk Employee

Good information. Thank you. Now what is considered good/normal vs bad? GETATTR skews the information. Your first query has 1.9M for GETATTR for sum(op_count) vs 190K for LOOKUP (which is the next highest).

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...