Monitoring Splunk

Why is my log quota being eaten up by "invisible" files?

grijhwani
Motivator

Using the following search, I find that in the hour after midnight there is a spike in indexing activity:

index="_internal" source="*license_usage.log" | eval ISODate=strftime(strptime(date_year."-".date_month."-".date_mday, "%Y-%b-%d"), "%Y-%m-%d (%a)") | eval kB=b/1024 | chart eval(round(sum(kB),0)) over date_hour by ISODate limit=0  | addcoltotals labelfield=date_hour

Activating "heat map" presentation instantly reveals a red blip just after midnight every day. This suggests some kind of daily rollover processing is impacting Splunk's indexing. In addition, in the first few days of installation the post-midnight log consumption grew on a daily basis, further suggesting it is dependent on the age of the Splunk installation.

0 Karma
1 Solution

grijhwani
Motivator

I found that is because Splunk is rotating its own log files and retaining them for 5 days, but it doesn't come with a blacklist to automatically suppress the rotated logs. Nett result: Splunk logs are being re-indexed at midnight, ultimately 5 times each.

The solution was a default blacklist by inserting $SPLUNK_HOME/etc/system/local/inputs.conf under defaults:

blacklist = .*\.(\d+|(gz|bz2))$

After restarting Splunk, this will suppress any log files (not just Splunk's but any systemwide) with sequence numbers or that have previously been compressed. Obviously this does not work retrospectively, but stops any future rotated duplicates being re-indexed.

It is also possible to see which input sources are being processed and which not, in real time, but only by tinkering with your access configs first.

I found this management URL useful in determining which sources are fitting which whitelist/blacklist filters and being consumed or otherwise:

https://{yoursplunkserver}:8089/services/admin/inputstatus/TailingProcessor:FileStatus

View solution in original post

0 Karma

grijhwani
Motivator

I posted the question and the answer together, because it took me a few months to track the "obvious" down, and thought it might be informative to other users. I am aware that the time conversion in the search is sloppy, but I first created it months ago, before I learned about the "convert" function.

0 Karma

grijhwani
Motivator

I found that is because Splunk is rotating its own log files and retaining them for 5 days, but it doesn't come with a blacklist to automatically suppress the rotated logs. Nett result: Splunk logs are being re-indexed at midnight, ultimately 5 times each.

The solution was a default blacklist by inserting $SPLUNK_HOME/etc/system/local/inputs.conf under defaults:

blacklist = .*\.(\d+|(gz|bz2))$

After restarting Splunk, this will suppress any log files (not just Splunk's but any systemwide) with sequence numbers or that have previously been compressed. Obviously this does not work retrospectively, but stops any future rotated duplicates being re-indexed.

It is also possible to see which input sources are being processed and which not, in real time, but only by tinkering with your access configs first.

I found this management URL useful in determining which sources are fitting which whitelist/blacklist filters and being consumed or otherwise:

https://{yoursplunkserver}:8089/services/admin/inputstatus/TailingProcessor:FileStatus
0 Karma

grijhwani
Motivator

I have since checked the Enterprise installation at work and found that exactly the same thing was happening there, but in comparison with the production logging the duplicated Splunk data were a drop in the ocean.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...