All Apps and Add-ons

Why is Splunk for Unix VMStat.sh Script Calculating Incorrectly

tkwaller
Builder

VMStat is either interpreting incorrectly OR is calculating incorrectly

the script is:

. `dirname $0`/common.sh

HEADER='memTotalMB   memFreeMB   memUsedMB  memFreePct  memUsedPct   pgPageOut  swapUsedPct   pgSwapOut   cSwitches  interrupts       forks   processes     threads  loadAvg1mi'
HEADERIZE="BEGIN {print \"$HEADER\"}"
PRINTF='END {printf "%10d  %10d  %10d  %10.1f  %10.1f  %10s   %10.1f  %10s  %10s  %10s  %10s  %10s  %10s  %10.2f\n", memTotalMB, memFreeMB, memUsedMB, memFreePct, memUsedPct, pgPageOut, swapUsedPct, pgSwapOut, cSwitches, interrupts, forks, processes, threads, loadAvg1mi}'
DERIVE='END {memUsedMB=memTotalMB-memFreeMB; memUsedPct=(100.0*memUsedMB)/memTotalMB; memFreePct=100.0-memUsedPct; swapUsedPct=swapUsed ? (100.0*swapUsed)/(swapUsed+swapFree) : 0}'

    if [ "x$KERNEL" = "xLinux" ] ; then
            assertHaveCommand uptime
            assertHaveCommand ps
            assertHaveCommand vmstat
            CMD='eval uptime ; ps -e | wc -l ; ps -eT | wc -l ; vmstat -s'
            PARSE_0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1}'
            PARSE_1='/total memory$/ {memTotalMB=$1/1024} /free memory$/ {memFreeMB+=$1/1024} /buffer memory$/ {memFreeMB+=$1/1024} /swap cache$/ {memFreeMB+=$1/1024}'
            PARSE_2='/pages paged out$/ {pgPageOut=$1} /used swap$/ {swapUsed=$1} /free swap$/ {swapFree=$1} /pages swapped out$/ {pgSwapOut=$1}'
            PARSE_3='/interrupts$/ {interrupts=$1} /CPU context switches$/ {cSwitches=$1} /forks$/ {forks=$1}'
            MASSAGE="$PARSE_0 $PARSE_1 $PARSE_2 $PARSE_3 $DERIVE"

    $CMD | tee $TEE_DEST | $AWK "$HEADERIZE $MASSAGE $FILL_BLANKS $PRINTF"  header="$HEADER"
    echo "Cmd = [$CMD];  | $AWK '$HEADERIZE $MASSAGE $FILL_BLANKS $PRINTF' header=\"$HEADER\"" >> $TEE_DEST

which results in:

/opt/splunkforwarder/etc/apps/TA-stubhub-all-lin-uf-SplunkforUnix/bin/vmstat.sh
memTotalMB   memFreeMB   memUsedMB  memFreePct  memUsedPct   pgPageOut  swapUsedPct   pgSwapOut   cSwitches  interrupts       forks   processes     threads  loadAvg1mi
    128950       95092       33858        73.7        26.3  8266537986          2.7      156773  1430358599  1540011497   566455287         769        1182        0.38

BUT

when I go to the host, I can run the command from above and see these numbers are incorrect

eval uptime ; ps -e | wc -l ; ps -eT | wc -l ; vmstat -s
20:26:10 up 332 days, 21:21, 1 user, load average: 0.77, 0.70, 0.75
761
1199
132045632 total memory
130138512 used memory
109152400 active memory
10666064 inactive memory
1907120 free memory
1185436 buffer memory
94279704 swap cache
8388604 total swap
224172 used swap
8164432 free swap
1506464540 non-nice user cpu ticks
5413 nice user cpu ticks
183450991 system cpu ticks
136289253029 idle cpu ticks
1112025 IO-wait cpu ticks
15060 IRQ cpu ticks
6827407 softirq cpu ticks
0 stolen cpu ticks
153941247 pages paged in
8265286006 pages paged out
83885 pages swapped in
156773 pages swapped out
1534356594 interrupts
1419375615 CPU context switches
1454713450 boot time
566387546 forks

This host is roughly 97% used and 3% free

free -m
             total       used       free     shared    buffers     cached
Mem:        128950     127061       1889          0       1157      92060
0 Karma
1 Solution

acharlieh
Influencer

It's actually quite simple: http://www.linuxatemyram.com/

In short, the applications on your host are not actually using 97% of your ram, they're only actually using about 26%.

See the Buffers and Cached columns on your free statement? Your linux kernel is using the available RAM to make your disk accesses faster. If you had an application that needed RAM, a part of this cache will be freed up and made available to the application.

Your second line output to free probably looks like this:

              total       used       free     shared    buffers     cached
 Mem:        128950     127061       1889          0       1157      92060
-/+ buffers/cache:       33844      95106

This second line is where your free/used numbers because that's what is used by / available to applications

View solution in original post

acharlieh
Influencer

It's actually quite simple: http://www.linuxatemyram.com/

In short, the applications on your host are not actually using 97% of your ram, they're only actually using about 26%.

See the Buffers and Cached columns on your free statement? Your linux kernel is using the available RAM to make your disk accesses faster. If you had an application that needed RAM, a part of this cache will be freed up and made available to the application.

Your second line output to free probably looks like this:

              total       used       free     shared    buffers     cached
 Mem:        128950     127061       1889          0       1157      92060
-/+ buffers/cache:       33844      95106

This second line is where your free/used numbers because that's what is used by / available to applications

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...