All Apps and Add-ons

rlog.sh using too much CPU

gdiazlo
Engager

Hello

The way the script is constructed makes it consume a lot of CPU when audit logs are big (100M). Because it reads the whole file, skipping line by line.

I have this proposal:

TAIL_SIZE=$((FILE_LINES-SEEK))
if [ $TAIL_SIZE -gt 0 ]; then
   exec 3<&0
   exec 0<`tail -$TAIL_SIZE $AUDIT_FILE`
   while read -r line
   do
         echo $line | tee $TEE_DEST | /sbin/ausearch -i 2>/dev/null | grep -v '^----'
   done
   exec 0<&3
fi

This way resource usage is quite lower.

will this script be updated anytime soon?

regards,

gabriel

youngsuh
Contributor

@gdiazlo 

I am linux newbie.  What information this script suppose add?  We're getting an error severity with "<no matches>"  what does that mean?

0 Karma

Lowell
Super Champion

Here's an additional optimization that also solve the UI issue where users get the error:

msg="A script exited abnormally" input="./bin/rlog.sh" stanza="default" status="exited with code 1"

This is based on Splunk more recent releases of this script, but changes 2 things.

  1. It checks to see if the number of lines equals the seek value. So if there's no new lines to read, don't bother running "awk".
  2. It swaps out the super user check with a file readability check. This makes it possible to run as the "splunk" (privileged) user as long as you've setup file system permissions appropriately.

    if [ "x$KERNEL" = "xLinux" ] ; then
    #assertInvokerIsSuperuser
    test -r "$AUDIT_FILE" || exit 1
    assertHaveCommand service
    assertHaveCommandGivenPath /sbin/ausearch
    if [ -n "service auditd status" -a "$?" -eq 0 ] ; then
    if [ -a $SEEK_FILE ] ; then
    SEEK=head -1 $SEEK_FILE
    else
    SEEK=1
    echo "0" > $SEEK_FILE
    fi
    FILE_LINES=wc -l $AUDIT_FILE | cut -d " " -f 1
    if [ $FILE_LINES -eq $SEEK ] ; then
    # No new events in audit.log
    exit 0
    fi
    if [ $FILE_LINES -lt $SEEK ] ; then
    # audit file has wrapped
    SEEK=0
    fi
    awk -v START=$SEEK -v OUTPUT=$SEEK_FILE 'NR>START { print } END { print NR > OUTPUT }' $AUDIT_FILE | tee $TEE_DEST | /sbin/ausearch -i 2>/dev/null | grep -v "^----"
    fi

flo_cognosec
Communicator

Just in case somebody stumbles across this some times and tries to copy-paste the script.

It seems the line
if [ -a $SEEK_FILE ] ; then

needs to be replaced with
if [ -e $SEEK_FILE ] ; then

As the "-a" does not seem to work with newer bash versions.

peter_krammer
Communicator

Thank you for your contribution, but I would no longer suggest to use my version of the script, because Splunk already updated their script in their current app.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

Good answers / workarounds. This is now officially fixed by Splunk in the *Nix App 5.0 -- http://apps.splunk.com/app/833/

peter_krammer
Communicator

We ran into the same problem and extended the previous solutions.
Now we do not even use a loop at all.
We have to note thought, that the script now depends on tail being installed.

. `dirname $0`/common.sh

SEEK_FILE=$SPLUNK_HOME/var/run/splunk/unix_audit_seekfile
AUDIT_FILE=/var/log/audit/audit.log

if [ "x$KERNEL" = "xLinux" ] ; then
 assertInvokerIsSuperuser
 assertHaveCommand service
 assertHaveCommandGivenPath /sbin/ausearch
 if [ -n "`service auditd status`" -a "$?" -eq 0 ] ; then
            if [ -a $SEEK_FILE ] ; then
                SEEK=`head -1 $SEEK_FILE`
            else
                SEEK=0
                echo "0" > $SEEK_FILE
            fi
            FILE_LINES=`wc -l $AUDIT_FILE  | cut -d " " -f 1`
            if [ $FILE_LINES -lt $SEEK ] ; then
                # audit file has wrapped
                SEEK=0 
            fi
            wc -l $AUDIT_FILE  | cut -d " " -f 1 > $SEEK_FILE
            exec 3<&0
            exec 0<"$AUDIT_FILE"
            tail -n +$(($SEEK+1)) | tee $TEE_DEST | /sbin/ausearch -i 2>/dev/null | grep -v '^----'
            exec 0<&3
 fi
elif [ "x$KERNEL" = "xSunOS" ] ; then
 :
elif [ "x$KERNEL" = "xDarwin" ] ; then
 :
elif [ "x$KERNEL" = "xFreeBSD" ] ; then
 :
fi
0 Karma

peter_krammer
Communicator

I would no longer suggest to use my version of the script, because Splunk already updated their script in their current app.

0 Karma

JSapienza
Contributor

In my case the high CPU seemed to be coming from the the COUNT and SEEK . So I commented out several lines and CPU returned to normal. Here is a copy of my rlog.sh

#!/bin/sh                                                                                                
# Copyright 2011 Splunk, Inc.                                                                       
#                                                                                                        
#   Licensed under the Apache License, Version 2.0 (the "License");                                      
#   you may not use this file except in compliance with the License.                                     
#   You may obtain a copy of the License at                                                              
#                                                                                                        
#       http://www.apache.org/licenses/LICENSE-2.0                                                       
#                                                                                                        
#   Unless required by applicable law or agreed to in writing, software                                  
#   distributed under the License is distributed on an "AS IS" BASIS,                                    
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                             
#   See the License for the specific language governing permissions and                                  
#   limitations under the License.      

. `dirname $0`/common.sh

SEEK_FILE=$SPLUNK_HOME/var/run/splunk/unix_audit_seekfile
AUDIT_FILE=/var/log/audit/audit.log

if [ "x$KERNEL" = "xLinux" ] ; then
 assertInvokerIsSuperuser
 assertHaveCommand service
 assertHaveCommandGivenPath /sbin/ausearch
 if [ -n "`service auditd status`" -a "$?" -eq 0 ] ; then
            if [ -a $SEEK_FILE ] ; then
                SEEK=`head -1 $SEEK_FILE`
            else
                SEEK=1
                echo "1" > $SEEK_FILE
            fi
            FILE_LINES=`wc -l $AUDIT_FILE  | cut -d " " -f 1`
            if [ $FILE_LINES -lt $SEEK ] ; then
                # audit file has wrapped
                SEEK=1 
            fi
            exec 3<&0
            exec 0<"$AUDIT_FILE"
            ##-# COUNT=0
            ##-# while read -r line
            tail -n +$SEEK | while read -r line
            do
                ##-# if [ $COUNT -lt $SEEK ] ; then
                ##-#     COUNT=`expr $COUNT + 1`
                ##-# else
                    echo $line | tee $TEE_DEST | /sbin/ausearch -i 2>/dev/null | grep -v '^----'
                ##-#     COUNT=`expr $COUNT + 1`
                ##-# fi 
            done
            exec 0<&3
            ##-# echo $COUNT > $SEEK_FILE
            wc -l $AUDIT_FILE  | cut -d " " -f 1 > $SEEK_FILE
 fi
elif [ "x$KERNEL" = "xSunOS" ] ; then
 :
elif [ "x$KERNEL" = "xDarwin" ] ; then
 :
elif [ "x$KERNEL" = "xFreeBSD" ] ; then
 :
fi

kevin_hanser
New Member

So at this point it looks like there are a couple different solutions to this -- Your (JSapienza) solution regarding count and seek, and the original issue/solution submitted by gdiazlo. I wonder if we can get some input from splunk as to their thoughts on this; the *nix app is their app after all. I'd be interested to hear their take...

Perhaps I'll ask them as I believe I have access to support 🙂

Thanks for the information and quick response!

0 Karma

gdiazlo
Engager

correction:

TAIL_SIZE=$((FILE_LINES-SEEK))
if [ $TAIL_SIZE -gt 0 ]; then
    tail -$TAIL_SIZE $AUDIT_FILE | while read -r line
    do
        echo $line | tee $TEE_DEST | /sbin/ausearch -i 2>/dev/null | grep -v '^----'
    done
    echo $FILE_LINES > $SEEK_FILE
fi

as the other sintaxs wasn't correct.

0 Karma

kevin_hanser
New Member

I have also noticed high CPU usage on my universal forwarders that is coming from rlog.sh. Has there been an official fix/patch for this?

Right now our splunk is running in a dev/testing environment/configuration, so I was probably going to just temporarily disable the rlog.sh script until we can work this out.

So any ideas/suggestions/fixes/confirmations of fixes would be great!

thx

0 Karma

JSapienza
Contributor

I am having the same issue, Was this correction successful ?

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...