Getting Data In

reindex data due to downtime & syslog-ng config

rewritex
Contributor

I ran into some downtime and it turns out Splunk didn't pick back up from where it last received data so I am trying to fill in the gaps.
I'm switching to the below stanza for monitoring syslog ... Docs say I shouldn't have to worry about dupe events, i hope that is true.

1) Any advice (or best practice methods) to pull in the missing data from a downtime period?

I'm thinking oneshot or copy rotated logs for the timeframe into a new folder and start/stop monitoring of the folder...

2) Also does this syslog-ng configuration below look good and will it handle downtime issues moving forward?

Thank You! ~Sean

I have a forwarder monitoring my syslog-ng logs ...
To start, it seems I need to update my UF inputs.conf to monitor the syslog-ng directory instead of just the live-logfile. I am now doing this:

[monitor:///logs/exchange/]
_TCP_ROUTING = group1
disabled = false
index = exchange
sourcetype = cas_connections
whitelist = \.bz2$|/logfile$
host_segment = 2
ignoreOlderThan = 14d

My logs structure are here:

3128940 Apr 14 00:00 exchange.2017-04-13-23.bz2
3187581 Apr 14 01:00 exchange.2017-04-14-00.bz2
2629204 Apr 14 02:00 exchange.2017-04-14-01.bz2
2453433 Apr 14 03:00 exchange.2017-04-14-02.bz2
2493487 Apr 14 04:00 exchange.2017-04-14-03.bz2
2517451 Apr 14 05:00 exchange.2017-04-14-04.bz2
3143645 Apr 14 06:00 exchange.2017-04-14-05.bz2
6790645 Apr 14 07:01 exchange.2017-04-14-06.bz2
19249034 Apr 14 08:02 exchange.2017-04-14-07.bz2
32589018 Apr 14 09:05 exchange.2017-04-14-08.bz2
29032985 Apr 14 10:05 exchange.2017-04-14-09.bz2
26370374 Apr 14 11:05 exchange.2017-04-14-10.bz2
132687878 Apr 14 11:20 logfile
0 Karma

gjanders
SplunkTrust
SplunkTrust

I use this method https://answers.splunk.com/answers/451674/how-do-i-use-syslogng-to-replace-splunk-tcp-or-udp.html#an... to allow the log files to be output per hour from syslogNG.
I then have a separate script delete any files modified more than X hours ago (4 hours from memory).

Since Splunk is reading the log files and the log files will be left on the filesystem for at least 4 hours (you could make this longer) it will catchup where it left off, so therefore I should not have to worry about when Splunk needs to restart as this is a default feature of Splunk.

0 Karma

woodcock
Esteemed Legend

IMHO, using oneshot from backups is DEFINITELY the way to go to plug the gaps especially because oneshot bypasses the fishbucket entirely. I would never use ignoreOlderThan with any other value than the desired retention period of your indexed data because this is a FILE-BASED (not an event-based) setting. Once a file is skipped by this setting, it is permanently blacklisted in the fishbucket (so even if new data comes into the file and it is now "30 seconds old", too bad: No indexing for you!) So if the retention period of this data is 14 days (or shorter), then that is (sort-of) fine (maybe).

gjanders
SplunkTrust
SplunkTrust

Just to re-iterate woodcock's point:
When discussing "ignoreOlderThan" in a Splunk case:

Using that attribute will cause files
to be ignored even after new data has
been added to them. Once the file is
ignored it will always be ignored.

This is not a bug. This is expected
behavior for that feature.

The only way to stop this is to remove
that feature and restart Splunk.

Thank you, Splunk Support

I would avoid the use of that attribute where possible, as restarting a Splunk forwarder is the only way I know of to reset the ignore property...

0 Karma

hardikJsheth
Motivator

Splunk do maintain last location that it has read in the index called fishbucket.

You should not worry even if you are monitoring the same file. The problem can come only in a scenario where your file size is more than 256B and Splunk has already partially indexed your file.
In such case best way is rename old file so syslong-ng starts putting new logs into the file and Splunk is indexing new data.

For older file there two options :
1.Find out the last event which was read by splunk and remove all the events which were already indexed. Then use one shot command.

  1. In case you don't have a problem in reindexing few data from the file, add a comment on the top of the file and then use oneshot command.

I would prefer first option though it's bit tedious but that's right way.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...