ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

marksnelling · ‎08-21-2012

Has anyone got a sample coldToFrozenScript that will copy frozen index archives to S3 before erasing them?

dwaddle · ‎08-21-2012

Might have a look at Shuttl -- http://blogs.splunk.com/2012/07/02/shuttl-for-big-data-archiving/

sbutto · ‎02-10-2019

Can you please tell me what your splunk setup looks like? What OS Splunk is installed on? How did you configure the indexer to run the script. What python packages you added and how?

But first try to use a file path with log_file_path.

Splunk_rocks · ‎02-17-2019

Thanks for checkinh @Sbutto
mine is native Linux running
I have put your script under /opt/splunk/etc/apps/
I have configured in indexes. conf to
run python path/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
Can you please help me on this im stuck

Splunk_rocks · ‎02-17-2019

As of now its just stand alone splunk running on single instance indexer

Splunk_rocks · ‎02-18-2019

@sbutto here is my inputs in the script

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/'

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnu_home_dir = ' /home/splunkqa/ '

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = ' xyz@@@domain.com '

Enabling the logging system

logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

FYI - my splunk was setup standalone indexr host splunk is installed under /opt/splunk
my indexr is configured under /splunk/index

Splunk_rocks · ‎02-09-2019

here is my error code

02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript File "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py", line 150
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript ^
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript SyntaxError: invalid syntax
02-09-2019 13:51:36.715 -0500 ERROR BucketMover - coldToFrozenScript cmd='"/bin/python" "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py" /splunk/index/splunk/noindexdb/db/db_1543106304_1543105443_10' exited with non-zero status='exited with code 1

sbutto · ‎08-31-2018

I have developed this script coldToFrozenPlusS3Uplaod.py that encrypts and uploads frozen buckets to S3.

It can be found here: https://github.com/marboxvel/Encrypt-upload-archived-Splunk-buckets

Splunk_rocks · ‎02-09-2019

Hey @sbutto im using your /coldToFrozenPlusS3Uplaod.py
to upload to S3 but getting issues can any one help me
here is the attributes i have added

import sys, os, gzip, shutil, subprocess, random, gnupg
import boto
import datetime
import time
import tarfile

applyLogging is a python script named applyLogging.py that exists at the same level of this script.

If the file applyLogging.py doesn't exist where this file is located, the import statement will fail.

sys.path.append(script_path)
import applyLogging

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/var/log/splunk/'

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnu_home_dir = /home/splunkq/.gnupg

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = xxyxy@gmail.com

Enabling the logging system

logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

First we need to find today's epoch

today=round(time.mktime(datetime.datetime.today().timetuple()))

Substract 120 days

one_month_earlier=today-120*86400

logger.info('Started on '+str(datetime.datetime.today()))

Getting the hostname so we can prefix the uploaded file name with it to distinguish buckets from different indexes.

hostname=os.uname()[1]

S3 creds

AWS_ACCESS_KEY_ID="xxxx"
AWS_ACCESS_KEY_SECRET="xxxx"
AWS_BUCKET_NAME="s3://zfu-splunk-pa/"

Creating the gpg object

gpg = gnupg.GPG(gnupghome=gnu_home_dir)

awurster · ‎02-02-2016

hey @marksnelling - a bit late but here is our sample script.

https://bitbucket.org/asecurityteam/atlassian-add-on-cold-to-frozen-s3/overview

there are a few assumptions, like IAM user keys or roles deployed to your nodes... but i've tested successfully across a large index cluster.

esix_splunk · ‎02-02-2016

Nice script. On a side note, you might look at awscli instead of s3cmd. Its an officially supported binary, along with being multithreaded (better performance!)

awurster · ‎02-02-2016

sure, anything is possible. but i would be more interested to see Splunk and AWS come together to make something more legit than what we've hacked together in 30 minutes.

dwaddle · ‎08-21-2012

Might have a look at Shuttl -- http://blogs.splunk.com/2012/07/02/shuttl-for-big-data-archiving/

awurster · ‎02-15-2016

I downvoted this post because this approach is no longer officially supported, and has too many dependencies attached (java, etc).

bchen · ‎10-16-2012

For more info on Shuttl setup see: See: https://github.com/splunk/splunk-shuttl/wiki/Quickstart-Guide

esix_splunk · ‎02-02-2016

Shuttl is deprecated and not in development any more, and I believe it wont work with > 6.2 due to python library incompatibilities.

At this point, you'd be better rolling to s3 with s3cmd or s3cli, as a script. In the future, perhaps there will be more functionality to include this as a roll to cold / frozen feature..

bchen · ‎10-16-2012

Hadoop does not need to be installed on the Splunk Indexer. If the data is in S3, then you can use the standard ways of deploying Hadoop to operate on the data there. See a discussion here: http://stackoverflow.com/questions/4092852/i-cant-get-hadoop-to-start-using-amazon-ec2-s3

Also keep in mind that if you want to use the data in Hadoop, you will want to archive in CSV format. If you want the data to come back to Splunk, you can bring the CSV data back (however, it may incur compute load on import), or for more efficient index restoration, store in Splunk Bucket format.

marksnelling · ‎08-22-2012

This looks promising, I'm not sure how this is deployed though. Do I install Hadoop on my Splunk indexer and map it to S3 or does it need to be installed in EC2 and access S3 that way?
I'm assuming Hadoop is required for S3 BTW.

Frozen archives into Amazon S3

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

reciepient_email = '' #the email the gpg uses to encrypt the files

Enabling the logging system

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

applyLogging is a python script named applyLogging.py that exists at the same level of this script.

If the file applyLogging.py doesn't exist where this file is located, the import statement will fail.

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

reciepient_email = '' #the email the gpg uses to encrypt the files

Enabling the logging system

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

First we need to find today's epoch

Substract 120 days

Getting the hostname so we can prefix the uploaded file name with it to distinguish buckets from different indexes.

S3 creds

Creating the gpg object

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!