Getting Data In

Can you help me with an issue with monitoring files which have log rotation after a certain size?

thirusama
Path Finder

We noticed that, right after a log rotation, the data is not being indexed until the next log rotation. That is, lets say, one file was rotated at 8 AM (until which the data was already indexed). The next file is written from 8 AM to 7 PM. But this file is not indexed until around 7 PM.

We are on a Universal forwarder 7.0.3

Below is the monitoring stanza

[monitor:///opt/mapr/hadoop/hadoop/logs/*nodemanager*]
sourcetype = my_st
index = my_index
disabled = 0
ignoreOlderThan = 2h

We added ignoreOlderThan = 2h recently to see if it helps. But the issue still persists.

The latest file will be with yarn-mapr-nodemanager-host_name.log and the latest archived file be with yarn-mapr-nodemanager-host_name.log.1.

What is interesting is intermittently on certain servers, the current file gets indexed only at the time of its roll/archival i.e. (lets say after 10-11 hours) but with actual file name but not archive file name. And the issue of live/current file not getting indexed on time does not happen all the time. The next live file might get indexed on time. There should be an ideal settings to avoid this.

Any insights on this will be helpful.

Whatever Splunk says about handling log rotation files, seems to have some bug. Are we missing anything here? Please suggest.

0 Karma

woodcock
Esteemed Legend

There was a point at the beginning where everything was working fine, right? And if you restart Splunk, it starts to get caught up but then it falls behind again, right? That is what happens when there thousands of files in the directory which Splunk has to dig through. You can either install housekeeping rules that move/delete files that have not been modified for X days/hours OR create soft links. Check out my answer here:

https://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html

0 Karma

thirusama
Path Finder

So in this case, will there be a delay of 5 minutes? or may be not? Please clarify. We will have to check/work with different team to put this cron on 500+ nodes

0 Karma

woodcock
Esteemed Legend

So you do have thousands of files in that same directory?

0 Karma

thirusama
Path Finder

No. We monitor around 500+ nodes/hosts. Each node will have 20 (archived .log.*)+1(latest .log)+1(latest .out) i.e. total of 22 files in each node/host. We still feel that there should be a straightforward setting/solution for this. It will be very difficult to have work around(soft links) on 500+ nodes that too convincing other team.

The problem occurs on around 5-10 nodes each day.

0 Karma

woodcock
Esteemed Legend

It doesn't matter how many files are being monitored there, it matters how many files total exist there. Are your 22 the only files there, or are there hundreds/thousands of others?

0 Karma

thirusama
Path Finder

Yeah, sure. Just 22 files in the monitored directory with the name nodemanager. There is a sub-directory and it contains around 57 sub-directories, but the name does not contain nodemanager. So probably around 57*5*3=855 unwanted files.
Should we try adding recursive = false, just to avoid scanning sub-directory?

0 Karma

woodcock
Esteemed Legend

Definitely add that setting, but it should not be necessary because you have no wildcards in your path, right?

0 Karma

thirusama
Path Finder

Have the wildcards for the file name. But I think Splunk adds the WATCH on that path, which means it might look for sub-directories by default? Anyways, we will add the setting recursive = false and monitor for a few days.

0 Karma

woodcock
Esteemed Legend

The ignoreOlderThan is definitely not going to help and will certainly cause other problems so definitely take that out.

0 Karma

thirusama
Path Finder

Okay. Wanted to see if it helps to reduce load on Forwarder as there are 20 files archived.

Also I forgot to mention that there is other file with extension .out yarn-mapr-nodemanager-host_name.out, which seem to be ingesting fine under the same sourcetype, when the other file(s) has issue.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...