Getting Data In

Question on indexes and retention policy

vrmandadi
Builder

I have 5 indexers in a cluster environment with replication factor 3 . We have a license of 350 GB and our daily average is around 310 GB per day.
We have indexes created with retention policy 365 days .Below is a sample index configuration
[main]
coldPath = $SPLUNK_DB/main/colddb
homePath = $SPLUNK_DB/main/db
thawedPath = $SPLUNK_DB/main/thaweddb
repFactor=auto
frozenTimePeriodInSecs = 31536000

I am trying to view data for last year but I dont see any results except for 10 to 20 events or even for the month of jan ,feb ,mar,apr of this year. We received multiple alerts regarding high disk usage almost 99 per many times from all the indexers.I have three indexers with 125.64 GB RAM each and two others with 503.62 GB each

1)How can I view the old data
2)What happens to the data if disk usage is high
3)Does the data from hot roll to cold immediately id disk usage is full
4)What changes should I make in order to have the data view when I search for an year

Please advice and apologies if these questions are silly

0 Karma

woodcock
Esteemed Legend

Splunk will freeze (which usually means delete) buckets based in EITHER time OR size, whichever threshold is reached first. It is clear here that you have reached the size threshold and the data is not searchable because the buckets with them have been deleted. You need more space. Here is a sizing tool that, while not prefect, is actually pretty good and we use it frequently. Be sure to use the fudge-factor feature:
https://splunk-sizing.appspot.com/

0 Karma

adonio
Ultra Champion

hello there:
1. splunk indexes retention is set by size or time (to frozen) whichever comes first. so for example, lets say you have an index set for 1 year by time, and you are indexing 1 gb per day, if you set the index siz to 500gb, time will take affect. if you set size to 200gb, you will have only 200 days worth of data. Note, in the example above i purposely simplified the calculations and didnt apply any compression factors or clustering
2. if disk is full, splunk will roll out buckets from indexes as it brings new data in
3. splunk will roll form hot to warm to cold to frozen, see links below for further detailed explanation
4. you will need to have enough disk to support your retention requirements, based on GB indexed and replication and search factor from clustering

further reading here:
https://conf.splunk.com/files/2017/slides/splunk-data-life-cycle-determining-when-and-where-to-roll-...
https://wiki.splunk.com/Deploy:BucketRotationAndRetention
https://docs.splunk.com/Documentation/Splunk/7.3.0/Admin/Indexesconf

hope it helps

0 Karma

vrmandadi
Builder

Thank you @adonio for your reply . So according to our environment where index around 300 GB per day with replication factor being 3 ,then the total data is around 1000 GBper day (approx) . So to view data for an year it is 1000 multiplied by 365 .That is the amount of disk space that should have combining all indexers to have the data view for one year.
If we reduce the replication factor to 2 then it would be 500 GB per day which is 182.5 TB storage capacity.

Did I say that right ?

0 Karma

adonio
Ultra Champion

nope,

if you bring in 300gb per day, the calculation is like that

300 X 0.5 (avg compression ratio) = 150gb per day of data committed to disk (this is 1 copy of data)
replication factor equals to ~15% of data size
search factor equals to ~35% of data size
RF = 3 -> 45% of data
SF = 2 (i assume) -> 70% of data
combined equal to 115% of data
300gb per day time 115% = 345gb per day to disk to meet replication requirements
345gb per day X 365 day in a year = 125,925gb that equals to ~130TB of disk across all your indexers

this link can be of help too:
https://splunk-sizing.appspot.com/

0 Karma

vrmandadi
Builder

Thank you for your detail explanation . We have few sources like palo alto logs , aws cloudtrail logs and windows event logs which we get almost 100 GB data per day .How does the retention policy work here .my question is that , 100 gb incoming data for 5 days combines a total of 500 gb then how many days of data will I have as the index size is 500 GB .

The other question is if two sources are getting around 100 each and they both have same index , does creating new index for each of them and by sourcetype which is getting large volumes of data helps the retention policy ?

0 Karma

adonio
Ultra Champion

now you are talking about indexes configuration
please read in detail index.conf.spec and the pdf i referenced above

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...