Recently, I have added a file share system for indexing via "Universal Forwarder" at Windows server to the receiver/deployment server (Linux). Yesterday, the total volume of raw data for the file share was 20 G and today it is 21 G. The corresponding indexing (for that particular file share) was 7 G yesterday (per day volume) and today again 7.123 G (again another per day volume); even larger than the increment to the raw data. In the inputs.conf, I have mentioned the global parameter ignoreOlderThan = 7d. Is it indexing the last 7 days in every 24 hours? Or is it something else? How can I determine and avoid? Note: there is no zip file and .xml file has been excluded from indexing.
How much of the data was Splunk able to index in the first day? Of the 20GB, how much data should have Splunk indexed, and how much did it actually index? I wonder if Splunk is "catching up."
ignoreOlderThan=7d
will not cause the data to be indexed twice.
I would turn on the Monitoring Console and look at the tabs on indexing for information, as a starting point.
day 1: 7GB; day 2: 7.531GB; day 3: 1.45GB (as we disabled the index and changed 2d and redeployed); day 4: 6.421GB
How can I understand whether is index is duplicating?