Getting Data In

duplicate data when monitoring a directory

rewritex
Contributor

Logs land in the logfile on the syslog server and logrotate/timestamp.script runs to roll the logs.
The problem I am having is duplicate data is coming into the Splunk index. Splunk seems to be reading rolled .bz2 file as new data while also reading the logfile as new data. Below are my configurations. Any ideas? Thanks.

inputs.conf is below:

[monitor:///logs/f5-host1/]
_TCP_ROUTING = group1
disabled = false
host = host1
index = index1
sourcetype = index1
whitelist = \.bz2$|/logfile$
ignoreOlderThan = 20d

The log folder is below:

-rw------- 1 Logz Logz  97138 Aug  1 13:04 f5-host1.2017-08-01-12.bz2
-rw------- 1 Logz Logz  105819 Aug  1 14:05 f5-host1.2017-08-01-13.bz2
-rw------- 1 Logz Logz  95384 Aug  1 15:05 f5-host1.2017-08-01-14.bz2
-rw------- 1 Logz Logz 342285 Aug  1 15:16 logfile
0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi rewritex,
why you have in whitelist also rolled files?
try with

[monitor:///logs/f5-host1/logfile]
 _TCP_ROUTING = group1
 disabled = false
 host = host1
 index = index1
 sourcetype = index1
 ignoreOlderThan = 20d

Why you use sourcetype like index?
Bye.
Giuseppe

0 Karma

rewritex
Contributor

Thanks for the response cuesllo. I put the full explanation in the previous comment. I whitelist the rolled files to help with data loss or network downtime. Sourcetype=index1 is fake, I changed it just for this post.

0 Karma

woodcock
Esteemed Legend

It looks like you have it setup to do both bz2 files and the raw logfile. While this was probably the right thing to do the first time you fired up Splunk on your forwarder (to get the stuff that you missed and was already rolled up), you need to remove the bz2 from the whitelist now or every time your logfile rotates to a bz2 it will get forwarded again.

0 Karma

rewritex
Contributor

Thanks. You seem to respond and answer the majority of my questions on the board.

The original idea was to monitor the directory to help fight data loss between the Host and the indexer cluster. If the connection between the the host and cluster went down, I wanted Splunk to automatically pick backup where it left off when service was resolved and scan the rotated logs to pull in the data it needed to fill in the blanks from the downtime.

I researched the forums and Splunk docs information to setup the inputs.conf to monitor the logfile and rolling log files to manage the above scenario. I was under the impression the rotated log would not be read because the CRC check would match and handle it in such a way for it not to be read again.

0 Karma

gjanders
SplunkTrust
SplunkTrust

The checksum will have changed once the data is compressed into bz2 format and therefore Splunk will read the data again...you could blacklist the files or whitelist in the inputs.conf (as per https://answers.splunk.com/answers/560579/splunk-forwarder-configuration.html#answer-560580 )

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...