All Apps and Add-ons

How to configure a new splunk instance to search previously indexed data stored on s3?

Log_wrangler
Builder

I have previously indexed data uploaded to an s3 bucket.

I installed Splunk (full version) on an EC2 (RHEL7).
I (persistently) mounted the s3 bucket to the EC2 instance (with FUSE).
I can see all the data when I change to my_s3fs_mount_directory,

(e.g. /my_s3fs_mount_directory/index_name/db_1234567_123456_1234/rawdata/journal.gz)

My question is how I should edit the indexes.conf correctly, so that my new indexer sees this data and doesn't accidentally overwrite the existing data in my path by accident.

Here is what I have so far (in /opt/splunk/etc/system/local/)

[myindex]

homePath = /my_s3fs_mount_directory/index_name/db
coldPath = /my_s3fs_mount_directory/index_name/colddb
thawedPath =/my_s3fs_mount_directory/index_name/thaweddb
maxDataSize = 10000
maxHotBuckets = 10

The index is visible but no data in results.

Is there anything else I need to do or another conf I would also need to edit?

Any advice is appreciated.
Thank you

Tags (3)
0 Karma
1 Solution

nickhills
Ultra Champion

S3 over fuse is S. L. O. W. As well as being a fake fs

I would mount an ebs and copy the data from S3 to the ebs before doing anything else

If my comment helps, please give it a thumbs up!

View solution in original post

0 Karma

nickhills
Ultra Champion

S3 over fuse is S. L. O. W. As well as being a fake fs

I would mount an ebs and copy the data from S3 to the ebs before doing anything else

If my comment helps, please give it a thumbs up!
0 Karma

Log_wrangler
Builder

Your suggestion is probably the best solution at this point.

My current scenario was a test to see if it would read, and apparently it will not (as you have mentioned the s3fs is slow, also object based, and not listed as supported).

For those interested I started another thread (title of question below) to see if Splunk 7.0 remotePath may be a solution.

"has anyone successful setup the remotePath option in indexes.conf in Splunk 7.0 to work with indexed data in s3?"

0 Karma

Log_wrangler
Builder

FYI, I was able to read a test file.txt from the /s3fs dir, but as a "data Input'

I could read the file.txt via data inputs > files & directories > new (then select the /s3fs/file.txt)

Of course this would need to be automated to input loads of files.... have not worked that out but any suggestions appreciated.

0 Karma

nickhills
Ultra Champion

I saw it!
I too am super interested in this, but as I note, i suspect it will only be for archive data

If my comment helps, please give it a thumbs up!
0 Karma

nickhills
Ultra Champion

Is the data a 'copy' of the indexes which you have uploaded to s3, or was the data frozen?

If the data was frozen, you need to copy the buckets to the thawed directory - not the hot/cold db

If my comment helps, please give it a thumbs up!
0 Karma

Log_wrangler
Builder

it was a copy of warm and cold.

0 Karma

micahkemp
Champion

One thing you want to be very careful with is making sure you get your frozenTimePeriodInSecs and maxTotalDataSizeMB correct before you point splunk at an existing index location. If either is wrong you risk splunk thinking data needs to be frozen (which really means deleted in most cases).

0 Karma

Log_wrangler
Builder

After reviewing, some other posts...

It is quite possible that the s3 object based data is just not compatible (with Splunk) without some custom code making it readable for splunk.

I am using an old version of Splunk (i.e. 5.x).

I am thinking that I will try Splunk 7.x and see if it can read indexed data from a remote s3 location.

Please advise if you have any more insight on this. If/when I get results, I plan to share lessons learned.

Thank you

0 Karma

micahkemp
Champion

Is this a standalone splunk instance (or are you trying to search directly from the instance that has the data mounted)?

Can you post the output of splunk btool indexes list --debug?

0 Karma

Log_wrangler
Builder

sorry sec-policy does not permit to post actual data thx

0 Karma

Log_wrangler
Builder

This is a standalone splunk instance on RHEL7 on the EC2 AWS instance.

I created a custom index which points to the s3 path...

When I restarted after creating the indexes.conf file for this index, I got this error

error message for /my_s3fs_mount_directory/...

homePath '/my_s3fs_mount_directory/index_name/db' is in a filesystem that Splunk cannot use. (index=index_name)

Checking indexes...
homePath '//my_s3fs_mount_directory/index_name/db' is in a filesystem that Splunk cannot use. (index=index_name)
Validating databases (splunkd validatedb) failed with code '1'.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...