Getting Data In

Opinion on frozen data deletion script that deletes older than 12 months data

splunkreal
Motivator

Hello guys,

I've created a shell script, scheduled with cron-like software, which deletes 12+ months data except for one special index (13 months), could someone gives comment on it?

Calling method : ./script.sh /var/frozen/ 2

Note : if you use it, at your own risk 🙂

Thanks.

#!/bin/bash
# Frozen buckets detection

#$1 = first arg : main directory
#$2 = second arg : depth

cd ~

curDateEpo=`date +%s`;
curDate=`date -d @$curDateEpo`;

beforeDateEpo=`expr $curDateEpo - 31536000`
# Special 13 months 10/09/2019
beforeDateEpoSpecial=`expr $curDateEpo - 34165800`

#echo "Current Epoch time: $curDateEpo";

#echo $(whoami)


#echo "Before 12 months : $beforeDateEpo";

for i in `find $1 -maxdepth $2 -type d`
do

read latest earliest <<<${i//[^0-9]/ };

earliest=${earliest:0:10}

if [[ $earliest =~ ^[0-9]+$ ]] ;
        then

        if [ "$earliest" != "" ]
        then

        earliestH=`date -d @$earliest +%Y/%m/%d`;

        # Compare

        # Case Special (CASE SENSITIVE!!!)
        if [[ $i == *"ppr_app_special/"* ]]
        then
        beforeDateEpo=$beforeDateEpoSpecial
        printf "*****  special detected in $i so applying $beforeDateEpo period *****\n" >> splunk_frozen_script.log;
        #else
        #printf "special not detected in $i so applying $beforeDateEpo period\n" >> splunk_frozen_script.log;
        fi


                if [ "$earliest" -lt "$beforeDateEpo" ]
                then
                    # Splunk format
                    #printf "$curDate;$i;$earliestH\n";

                    ### PURGE ###
                    rm -r $i
                    printf "$curDate;$i;$earliestH;DELETED\n" >> splunk_frozen_script.log;
                    echo "Some buckets have been deleted, logged in splunk_frozen_script.log";


                fi
        fi
fi


done

printf "$curDate;$curDate;EXEC_FINISHED\n" >> splunk_frozen_script.log;
* If this helps, please upvote or accept solution 🙂 *
Labels (1)

Mehran_Safari
Explorer

hi 

https://github.com/mehransafari/Splunk_Frozen_Cleanup

this is a script that will find frozen logs older than X days and will ask you to remove them if you want 

it may help you

0 Karma

Powers64
Explorer

Here is a script I created that should do what you are asking! It also gives you the ability to delete from all indexes on a frozen path. The initial script came from seigex on Reddit: https://www.reddit.com/r/Splunk/comments/86lxao/script_to_delete_frozen_data_based_on_epoch_time/

The expected frozen path is {frozen-path}/{index-name}/{bucket_folder}. Be sure to insert your own log directory file and frozen path

#/bin/bash
#Script Master: Powers64
#Schedule: Cronjob is set to run this script @ ...
#Purpose: Check for db in each index in frozen storage that is older then needed retention date based on epoch, then delete if found

TODAY=`date`
#useful if want to monitor and create a dashboard from the logs, comma-delimited.
CLEAN_LOG=[.../removed_frozen_bucket_by_epoch.log]

#Offline retention is set to 1 year
RETENTION_BY_EPOCH=`date --date="1 years ago" +%s`

echo "Retention is set to: " $RETENTION_BY_EPOCH
SPLUNK_FROZEN_PATH="[/frozen/mount/.../splunk_bucket_backups/frozen_buckets]"

##Pulls list of all index folders
cd $SPLUNK_FROZEN_PATH
for line in $(ls -d */ -1 | cut -f1 -d'/'); do

    cd $SPLUNK_FROZEN_PATH/$line/
#Checks if the folder is empty
    if [ "$(ls -A $SPLUNK_FROZEN_PATH/$line/)" ];
    then
#checks all db folders, avoids touching inflight folders
            for d in db_* ; do

                    START_EPOCH="$(cut -d'_' -f3 <<<$d)"
                    END_EPOCH="$(cut -d'_' -f2 <<<$d)"
                    BUCKET_NUM="$(cut -d'_' -f4 <<<$d)"
                    BUCKET_SIZE="$(du -ch $SPLUNK_FROZEN_PATH/$line/$d | grep total | cut -b 1-4)"

                    if [ $END_EPOCH \< $RETENTION_BY_EPOCH ] &&  [ $START_EPOCH != 0 ];
                    then
                            echo "the following bucket will be deleted: " $SPLUNK_FROZEN_PATH/$line/$d
                            echo -e "$TODAY,index=$line,bucket_folder=$d,bucket_num=$BUCKET_NUM,bucket_size=$BUCKET_SIZE,earliest_epoch=$START_EPOCH,latest_epoch=$END_EPOCH,set_retention=$RETENTION_BY_EPOCH" >> $CLEAN_LOG_DIR
                   rm -rf $SPLUNK_FROZEN_PATH/$line/$d
                    fi
            done
    fi
done
echo "Frozen Storage now adheres to retention policy!"

yZinou
Engager

Hello @Powers64,

Thank you for sharing this script.

I have a question please, I'm a bit confused by why should we specify {frozen-path}/{index-name}/{bucket_folder} as the path if the script is deleting from all indexes and not a single one ?

I see that the this line ls -d */ -1 | cut -f1 -d'/' prior to the loop in your script allows listing of all indexes.

Can you please confirm that path={frozen_path} (where all indexes with frozen buckets are) ? 

Thank you very much for your support.

Regards.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Any specific reason for not utilizing Splunk's native data retention settings?

0 Karma

splunkreal
Motivator

Hi, there is no native frozen data deletion script, or am I wrong? In our case we want 6 monhs online and 6 months archived for most indexes.

* If this helps, please upvote or accept solution 🙂 *
0 Karma

somesoni2
SplunkTrust
SplunkTrust

The default behavior of Splunk is to delete the frozen buckets (no auto archive), if you're keeping data searchable for say 12 months, if you set the retention period (frozenTimePeriodInSecs in indexes.conf), the data will be searchable for 12 months and then it'll start deleting those older buckets which have crossed the retention period. This can be set at global level (so it applies to all indexes) and you can override it at index level (for that one exception index). Again, This works for case where you want data to be searchable for whole retention period.

When you say you archieve data for 6 month, means you keep the frozen buckets for 6 months and then delete them using this script?

splunkreal
Motivator

yes exactly

* If this helps, please upvote or accept solution 🙂 *
0 Karma

somesoni2
SplunkTrust
SplunkTrust

You're doing custom thing so probably custom script is the best idea here. One final thing, you archive after 6 month so the data is not searchable for after 6 month, they're just kept there so that it can be restored if required. Will there be a problem if they're searchable? If you're looking for cost saving by reducing the bucket size, that can be achieved by tsidx reduction (depending upon which version of Splunk are you using).

splunkreal
Motivator

That's good question, we have different partitions and data are splitted : HOT-WARM/COLD/FROZEN.

Is the script technically correct in your opinion?

Thanks 🙂

* If this helps, please upvote or accept solution 🙂 *
0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...