All Apps and Add-ons

Frozen archives into Amazon S3

marksnelling
Communicator

Has anyone got a sample coldToFrozenScript that will copy frozen index archives to S3 before erasing them?

Tags (3)
0 Karma
1 Solution

sbutto
Explorer

Hi @Splunk_rocks,

Can you please tell me what your splunk setup looks like? What OS Splunk is installed on? How did you configure the indexer to run the script. What python packages you added and how?

But first try to use a file path with log_file_path.

0 Karma

Splunk_rocks
Path Finder

Thanks for checkinh @Sbutto
mine is native Linux running
I have put your script under /opt/splunk/etc/apps/
I have configured in indexes. conf to
run python path/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
Can you please help me on this im stuck

0 Karma

Splunk_rocks
Path Finder

As of now its just stand alone splunk running on single instance indexer

0 Karma

Splunk_rocks
Path Finder

@sbutto here is my inputs in the script

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/'

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnu_home_dir = ' /home/splunkqa/ '

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = ' xyz@@@domain.com '

Enabling the logging system

logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

FYI - my splunk was setup standalone indexr host splunk is installed under /opt/splunk
my indexr is configured under /splunk/index

0 Karma

Splunk_rocks
Path Finder

here is my error code

02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript File "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py", line 150
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript sys.stderr.write("mkdir warning: Directory '" + ARCHIVE_DIR + "' already exists\n")
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript ^
02-09-2019 13:51:36.711 -0500 ERROR BucketMover - coldToFrozenScript SyntaxError: invalid syntax
02-09-2019 13:51:36.715 -0500 ERROR BucketMover - coldToFrozenScript cmd='"/bin/python" "/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py" /splunk/index/splunk/noindexdb/db/db_1543106304_1543105443_10' exited with non-zero status='exited with code 1

0 Karma

sbutto
Explorer

I have developed this script coldToFrozenPlusS3Uplaod.py that encrypts and uploads frozen buckets to S3.

It can be found here: https://github.com/marboxvel/Encrypt-upload-archived-Splunk-buckets

0 Karma

Splunk_rocks
Path Finder

Hey @sbutto im using your /coldToFrozenPlusS3Uplaod.py
to upload to S3 but getting issues can any one help me
here is the attributes i have added

import sys, os, gzip, shutil, subprocess, random, gnupg
import boto
import datetime
import time
import tarfile

applyLogging is a python script named applyLogging.py that exists at the same level of this script.

If the file applyLogging.py doesn't exist where this file is located, the import statement will fail.

sys.path.append(script_path)
import applyLogging

CHANGE THIS TO YOUR ACTUAL ARCHIVE DIRECTORY!!!

ARCHIVE_DIR = "/splunk/index/splunk/archiveindex"

ARCHIVE_DIR = os.path.join(os.getenv('SPLUNK_HOME'), 'frozenarchive')

script_path = '/opt/splunk/etc/apps/Encrypt-upload-archived-Splunk-buckets-master/coldToFrozenPlusS3Uplaod.py'
log_file_path = '/opt/splunk/var/log/splunk/'

gnu_home_dir = '' #where the gpg directory is. For example /home/s3/.gnupg/

gnu_home_dir = /home/splunkq/.gnupg

reciepient_email = '' #the email the gpg uses to encrypt the files

reciepient_email = xxyxy@gmail.com

Enabling the logging system

logger = applyLogging.get_module_logger(app_name='SplunkArchive',file_path=log_file_path)

Finding out the epoch value at four month ago so we can copmare the bucket timestamp against it.

First we need to find today's epoch

today=round(time.mktime(datetime.datetime.today().timetuple()))

Substract 120 days

one_month_earlier=today-120*86400

logger.info('Started on '+str(datetime.datetime.today()))

Getting the hostname so we can prefix the uploaded file name with it to distinguish buckets from different indexes.

hostname=os.uname()[1]

S3 creds

AWS_ACCESS_KEY_ID="xxxx"
AWS_ACCESS_KEY_SECRET="xxxx"
AWS_BUCKET_NAME="s3://zfu-splunk-pa/"

Creating the gpg object

gpg = gnupg.GPG(gnupghome=gnu_home_dir)

0 Karma

awurster
Contributor

hey @marksnelling - a bit late but here is our sample script.

https://bitbucket.org/asecurityteam/atlassian-add-on-cold-to-frozen-s3/overview

there are a few assumptions, like IAM user keys or roles deployed to your nodes... but i've tested successfully across a large index cluster.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Nice script. On a side note, you might look at awscli instead of s3cmd. Its an officially supported binary, along with being multithreaded (better performance!)

0 Karma

awurster
Contributor

sure, anything is possible. but i would be more interested to see Splunk and AWS come together to make something more legit than what we've hacked together in 30 minutes.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

awurster
Contributor

I downvoted this post because this approach is no longer officially supported, and has too many dependencies attached (java, etc).

0 Karma

bchen
Splunk Employee
Splunk Employee

For more info on Shuttl setup see: See: https://github.com/splunk/splunk-shuttl/wiki/Quickstart-Guide

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Shuttl is deprecated and not in development any more, and I believe it wont work with > 6.2 due to python library incompatibilities.

At this point, you'd be better rolling to s3 with s3cmd or s3cli, as a script. In the future, perhaps there will be more functionality to include this as a roll to cold / frozen feature..

0 Karma

bchen
Splunk Employee
Splunk Employee

Hadoop does not need to be installed on the Splunk Indexer. If the data is in S3, then you can use the standard ways of deploying Hadoop to operate on the data there. See a discussion here: http://stackoverflow.com/questions/4092852/i-cant-get-hadoop-to-start-using-amazon-ec2-s3

Also keep in mind that if you want to use the data in Hadoop, you will want to archive in CSV format. If you want the data to come back to Splunk, you can bring the CSV data back (however, it may incur compute load on import), or for more efficient index restoration, store in Splunk Bucket format.

0 Karma

marksnelling
Communicator

This looks promising, I'm not sure how this is deployed though. Do I install Hadoop on my Splunk indexer and map it to S3 or does it need to be installed in EC2 and access S3 that way?
I'm assuming Hadoop is required for S3 BTW.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...