Getting Data In

Why is a particular index-volume (per day) increasing?

dban2005
New Member

Recently, I have added a file share system for indexing via "Universal Forwarder" at Windows server to the receiver/deployment server (Linux). Yesterday, the total volume of raw data for the file share was 20 G and today it is 21 G. The corresponding indexing (for that particular file share) was 7 G yesterday (per day volume) and today again 7.123 G (again another per day volume); even larger than the increment to the raw data. In the inputs.conf, I have mentioned the global parameter ignoreOlderThan = 7d. Is it indexing the last 7 days in every 24 hours? Or is it something else? How can I determine and avoid? Note: there is no zip file and .xml file has been excluded from indexing.

0 Karma

lguinn2
Legend

How much of the data was Splunk able to index in the first day? Of the 20GB, how much data should have Splunk indexed, and how much did it actually index? I wonder if Splunk is "catching up."

ignoreOlderThan=7d will not cause the data to be indexed twice.

I would turn on the Monitoring Console and look at the tabs on indexing for information, as a starting point.

0 Karma

dban2005
New Member

day 1: 7GB; day 2: 7.531GB; day 3: 1.45GB (as we disabled the index and changed 2d and redeployed); day 4: 6.421GB
How can I understand whether is index is duplicating?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...