Getting Data In

Why does Splunk (re-)index this rolled file? How to troubleshoot?

twinspop
Influencer

Inputs stanza from btool:

[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]
_rcvbuf = 1572864
disabled = 0
host = apphost1
index = reporting_main
sourcetype = reporting_crtlog

The log rotation they use keeps 10 rolled copies, named with .1-10 on the end. Eg, when the original rolls it gets named CRTLog.log.1 and a new CRTLog.log file is created. Standard stuff.

I have confirmed, without a doubt, the rolled files maintain consistent content. I wrote a script to grab checksums of the first 1KB of each file every few seconds. They always check out -- .1's checksum matches what the original showed before rolling.

However, Splunk is sometimes (not all the time) treating the 1st rolled file as a new file:

 WatchedFile - Will begin reading at offset=0 for file='/apps/Logs/apphost1/www/Reporting/CRTLog.log.1'

Probably 30% of the time it re-reads the rolled file. Only .1, never any of the others.

Any tips to further troubleshoot this?

(Ticket's open, but after 3 days I kinda need an answer.)

EDIT: Sample checksum comparo:

I use for f in $(ls); do echo -n "$f: "; head -50 $f | md5sum; done to grab a list:

CRTLog.log: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.1: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.10: ffc1a6dec71a64f69a2f4c42b53d68cb  -
CRTLog.log.2: a3b7d786d8aa7260cc5e46635e764c8f  -
<snip>

Then wait for a roll to fire and grab the new list:

CRTLog.log: ad978fdb89b04169e95ba96c15887042  -
CRTLog.log.1: 0fb375c11ad382eec3cc482fb1332c81  -
CRTLog.log.10: 82d1b645c89e4e34b4e0a89712d30f3e  -
CRTLog.log.2: 40f3878392f5ca816bfc4948b263d0e2  -
CRTLog.log.3: a3b7d786d8aa7260cc5e46635e764c8f  -
<snip>

So the first 50 lines (about 16 KB worth of data), matches before and after roll to .1. Splunk re-read the file in this case.

0 Karma
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

This issue is resolved by
7.1 (SPL-149198)
7.0.4 (SPL-153453)
6.6.7(SPL-146190)

View solution in original post

hrawat_splunk
Splunk Employee
Splunk Employee

This issue is resolved by
7.1 (SPL-149198)
7.0.4 (SPL-153453)
6.6.7(SPL-146190)

rewritex
Contributor

Hello Jon... Any luck with an answer or resolution on this issue?

twinspop
Influencer

No. Splunk Support was not helpful, wasting hours of work. I eventually told the user to only index the current file and realize that some logs will be lost at roll time. It's a horrible solution, but I can't get anyone at Splunk to care.

0 Karma

rewritex
Contributor

I didn't realize my question was already asked ... sorry about that.

A recent issue I had concerning getting the data in ... I had to remove my * and pull in the whole directory. My [monitor:///Logs/isam/reports/access.log*] became [monitor:///Logs/isam/reports/access.log/] and that worked for me.. It had to monitor the whole directory instead of the wildcard on the log name. I also kept running into a problem with the whitelist parameter so I dropped that. I worked with $SPLUNK_HOME/bin/splunk list monitor to show me which files/directories are being monitored (ran on my UF)... This highlighted a regex issue I had with escaping a character incorrectly in another stanza. Good Luck.

twinspop
Influencer

EDIT: Spoke too soon. Just got lucky with a string of good rolls. The 13th one failed. Same scenario. Sigh.

This looks like the fix (EDIT: nope).

Bad:

[monitor:///apps/Logs/*/www/Reporting/CRTLog.log*]

Good:

[monitor:///apps/Logs/*/www/Reporting/]
whitelist = CRTLog

That seems like a bug to me. Not sure what's triggering it because I use the "Bad" style above in literally a thousand different scenarios. This is the first that's bitten me.

Thanks!

0 Karma

rewritex
Contributor

It looks like a pretty standard inputs.conf stanza.... How about the CRTLog.log* in your monitor line ... [monitor:///apps/Logs/*/www/Reporting/CRTLog.log*] ... Have you tried without the * at the end and just have [monitor:///apps/Logs/*/www/Reporting/CRTLog.log] otherwise I like the blacklist idea from woodcock or maybe have the log-roll name changed?

0 Karma

rewritex
Contributor

I didn't realize my question was already asked ... sorry about that.

A recent issue I had concerning getting the data in ... I had to remove my * and pull in the whole directory. My [monitor:///Logs/isam/reports/access.log*] became [monitor:///Logs/isam/reports/access.log/] and that worked for me.. It had to monitor the whole directory instead of the wildcard on the log name. I also kept running into a problem with the whitelist parameter so I dropped that. I worked with $SPLUNK_HOME/bin/splunk list monitor to show me which files/directories are being monitored (ran on my UF)... This highlighted a regex issue I had with escaping a character incorrectly in another stanza. Good Luck.

0 Karma

twinspop
Influencer

The identification of files regardless of name to handle rolled files is a core feature of splunk. And in this case, it's required for us. Without the asterisk we very noticeably miss log entries. Currently our choice is to miss log entries or have double entries. Not optimal! 🙂

0 Karma

woodcock
Esteemed Legend

I don't know why but you should just blacklist the *log.1 file and be done with it.

0 Karma

twinspop
Influencer

I have a sneaking feeling I would just see .2 show up as a dup. So the next step would be to drop the * and just log the original... but then we get missed logs. (Busy log file)

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Why even do an asterisk after .log in the monitor line? As long as they have been indexed when CRTLog.log, no need to even look at them ever again:

[monitor:///apps/Logs/*/www/Reporting/CRTLog.log]

If this is pushed out to a new host via the deployment server, I can see why you would want the old files indexed, but that is the only case I can see for adding the * on the end of the line.

One more case for not having the asterisk is that it requires less CPU and memory to look at just one file vs. 11 files.

Just tryin' to keep it simple. 🙂

0 Karma

twinspop
Influencer

It's a busy log file. It often rolls before Splunk has finished reading the last X entries. Including the rolled files in the monitor entry is best practice -- if not officially from Splunk, definitely in my experience. Usually it works fine, I'm just at a loss to explain why it's failing in this case.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...