Deployment Architecture

How do I stop duplicate entries from PM2 logs from being indexed in Splunk?

Carpesimia
New Member

We utilize splunk to forward log files written out by PM2 (a node.js process manager) to our Splunk indexers. PM2 has its own logrotate features, and creates backup log files when it reaches its settings. These log files are also in the same folder, and we are indexing *.log. We DO want this data to be evaluated, because there may be a time that the forwarders are down and we don't want to miss anything that may have been logged.

Example:

prog.log
prog_2018-01-03.log
prog_2018-01-02.log
prog_2018-01-01.log

In the above scenario, how do we keep things that have been indexed in prog.log from becoming indexed when the file is written out as prog_date.log? Keeping in mind that we do want to ensure we dont miss any entries for outages, and want to continue to process the dated logs as a backup.

We just upgraded to splunkforwarder 7.0.4, since we were under the impression it would assist with this, but we are still seeing the same results.

Tags (1)
0 Karma

somesoni2
Revered Legend

Generally Splunk should not ingest a file's content which is renamed, if it has already read it, but it does when you use crcSalt =<SOURCE> in inputs.conf stanza. Could you share full stanza from inputs.conf of your forwarder using which your file is being monitored?

0 Karma

Carpesimia
New Member

Yeah, I've got that, but I'd added it for another reason. Here's the stanza in question:

[monitor:///var/log/mservices/]
sourcetype = microservices_log
index = mservices
disabled = false
blacklist = .(bz2|gz)$
crcSalt = < source >

Is there something else i could use for the crcSalt that would alleviate this issue?

0 Karma

somesoni2
Revered Legend

What was the reason you added crcSalt?

0 Karma

Carpesimia
New Member

IT was done many months ago by a member of my team who is no longer here. I think it was to prevent double indexing within the main log file itself, if Im not mistaken.

0 Karma

somesoni2
Revered Legend

The official usage/description of crcSalt=<SOURCE> is this.

The crcSalt attribute, when set to <SOURCE>, ensures that each file has a unique CRC. The effect of this setting is that Splunk Enterprise assumes that each path name contains unique content.

So when your monitoring stanza, because of wildcard, includes both regular logs and rolled logs, you shouldn't be using crcSalt=<SOURCE>. Does your file contains some sort of headers, as the first few lines of your file?

0 Karma

Carpesimia
New Member

No, no headers. Just log lines. When thinking on this more, there may have been some lines being split or something similar that caused us to add the crcsalt. I read the definition and have already removed the crcsalt line, so now i just need to wait a day or two and see if some other weird issue raises its head.

I appreciate the assistance, and hope that this solves everything.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...