The free memory coming out of the vmstat.sh script in both the *nix app and the TA_nix apps is not giving correct free memory. I can fix the scripts to give accurate information, but was wondering if anyone else had seen this and had to manually change it?
Here is the code block from the vmstat.sh for Linux:
DERIVE='END {memUsedMB=memTotalMB-memFreeMB; memUsedPct=(100.0*memUsedMB)/memTotalMB; memFreePct=100.0-memUsedPct; swapUsedPct=swapUsed ? (100.0*swapUsed)/(swapUsed+swapFree) : 0}'
if [ "x$KERNEL" = "xLinux" ] ; then
assertHaveCommand uptime
assertHaveCommand ps
assertHaveCommand vmstat
CMD='eval uptime ; ps -e | wc -l ; ps -eT | wc -l ; vmstat -s'
PARSE_0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1}'
PARSE_1='/total memory$/ {memTotalMB=$1/1024} /free memory$/ {memFreeMB+=$1/1024} /buffer memory$/ {memFreeMB+=$1/1024} /swap cache$/ {memFreeMB+=$1/1024}'
PARSE_2='/pages paged out$/ {pgPageOut=$1} /used swap$/ {swapUsed=$1} /free swap$/ {swapFree=$1} /pages swapped out$/ {pgSwapOut=$1}'
PARSE_3='/interrupts$/ {interrupts=$1} /CPU context switches$/ {cSwitches=$1} /forks$/ {forks=$1}'
MASSAGE="$PARSE_0 $PARSE_1 $PARSE_2 $PARSE_3 $DERIVE"
Here is the output from that on the cli:
./vmstat.sh
memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi
129181 122331 6850 94.7 5.3 83083638 0.0 5 2474094895 582278992 124945211 791 1270 0.09
And output from just vmstat -s (which is what the script is using)
vmstat -s
132282320 total memory
112188224 used memory
67357104 active memory
40568500 inactive memory
20094096 free memory
251012 buffer memory
104906704 swap cache
134119416 total swap
20 used swap<br>
134119392 free swap<br>
63568055 non-nice user cpu ticks<br>
439 nice user cpu ticks<br>
27415594 system cpu ticks<br>
1796620745 idle cpu ticks<br>
44553858 IO-wait cpu ticks<br>
19458 IRQ cpu ticks<br>
2447076 softirq cpu ticks<br>
0 stolen cpu ticks<br>
76806178 pages paged in<br>
83103846 pages paged out<br>
0 pages swapped in<br>
5 pages swapped out<br>
583605396 interrupts<br>
2475721333 CPU context switches<br>
1377403815 boot time<br>
124948563 forks<br>
Here is output from free:
free
total used free shared buffers cached
Mem: 132282320 112170712 20111608 0 251164 104907988
-/+ buffers/cache: 7011560 125270760
Swap: 134119416 20 134119396
I can see the math going on in the DERIVE variable, but can't make sense of it at all. I was trying to run historical analysis on some Oracle DB servers that were having issues and it shows that we have 93% ram free in in the *NIX app which is obviously wrong. Luckily we have munin and nagios, but it would have been nice to get this all out of Splunk.
Just curious what everyone else is doing with this?
I see now what it is doing after spending some time troubleshooting these boxes. It is taking the cached memory and deleting it from the used, since it does not count this as "used memory." I don't know if this is the best method really since a lot of admins would like to know the memory is being cached.