Getting Data In

Data reindexed after UF app updated using deployment server

beaumaris
Communicator

We are using a 4.2.1 UF node to monitor a directory that contains web access log files, and send those files to an indexer. All of our nodes, including the UF and IDX nodes, have custom apps deployed to them from a central deployment server running on a Job Scheduler node. We're observing that if we update the inputs.conf file that defines the monitoring on the UF node, once the updated app is deployed to the UF (which causes splunk to restart), all of the web access log files are re-sent to to the indexer which causes duplicate events in the system.

It seems like Splunk would do the bookkeeping related to the monitoring process someplace outside of the .../splunk/etc/apps directory which is the only thing changing in the above scenario. We would not expect the files to be read again and sent again just by updating a custom app. Has anyone seen this issue and is there something we can do to prevent the files from being re-read? After all if we simply restart Splunk on the UF node the files are not read again.

Tags (1)
0 Karma

Drainy
Champion

Splunk uses something called the fishbucket (don't worry about the name, thats just what its called 🙂 ) to track what files it has read and where it has read them, so it is managed outside of the config.

Modifying the inputs to change index target or other details still won't result in the UF re-indexing the data as you would need to clean the fishbucket first.

So, firstly have a look at the splunkd.log file in the SPLUNK_HOME/var/log/splunk/ directory and see if it gives any indication of why it is indexing the files or perhaps if there are any errors related to the fishbucket.

The other option is if there is any kind of log rotation or how the logs are appended to that may be interfering with how Splunk is monitoring the log file and causes it to think the whole file has changed substantially enough to index the whole file again?

0 Karma

beaumaris
Communicator

Anyone?
This is still an issue for us and we do not expect the UF to re-send already processed logs to the indexer after an application is loaded and Splunk restarted.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...