I've came across an issue where my monitored files are not all indexed and I came to know that this is because they start with similar long headers. Upon research, I was introduced to setting crcSalt= as a solution. However, setting this will reindex the files (its contents) that are already indexed.
The file I am indexing gets updated with new entries at different time intervals. How do I address this to avoid reindexing old entries of the file and only index new lines/events? I am seeing ignoreOlderThan and check_method to deal with this. Can anyone suggest the best way for my scenario? Thanks in advance.
Scenario:
system1.log - 5 events
system2.log - 5 events
system3.log - 5 events
system4.log - 5 events
Due to similar headers( I need to index all), only below are indexed.
system1.log - 5 events
system3.log - 5 events
Incorporating crcSalt= as a fix
system1.log - 10 events
system2.log - 5 events
system3.log - 10 events
system4.log - 5 events
How do I avoid reindexing already indexed files to have the result below, and continously index new events when the below data are updated?
system1.log - 5 events
system2.log - 5 events
system3.log - 5 events
system4.log - 5 events
Appreciate Responses.
-- I came to know that this is because they start with similar long headers. Upon research, I was introduced to setting crcSalt=
as a solution.
For long headers initCrcLen
is the solution and not crcSalt=<SOURCE>
. crcSalt=<SOURCE>
jeopardizes the entire Splunk algorithm and we need to be careful using it ; - )
Apparently, these two are quite often being confused and in our place it became pretty messy due to this confusion.
Wish I could upvote this three or four times.
you are too kind @dwaddle ; -) and you made my day.
Thank you for the response, @ddrillic. Will test this parameter in non-prod and update this trail. Will incorporating initCrcLength would not encounter an issue in reindexing already indexed events?
i have tried this, but i had an issue in reindexing already indexed files as well.
So, are you ok now?
You might have to use the combination of ignoreOlderThan and followTail
Refer to answer: https://answers.splunk.com/answers/508545/how-to-avoid-indexing-events-twice-when-applying-c.html
Refer to Splunk Documentation: https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf#MONITOR:
If your existing logs have already been indexed, you can also remove them prior to turning on crcSalt=<SOURCE>
. Please test out your inputs conf changes in Non-Production system first to ensure data being indexed is not getting duplicated.
Thank you @niketnilay.
I am encountering a scenario though upon testing.
All files are continuously updated over time. But incorporating ignoreOlderThan will ignore the files, and will not index new events, unless the forwarders are restarted.
Are there any other ways to address avoiding reindexing already indexed files?
@arielpconsolacion, One of the crude ways would be to do this crcSALT change during maintenance window. First move the files already indexed to a different location (not being monitored by Splunk). Then apply the crcSALT change. Not sure if this is feasible in your case.
Thank you again for this suggestion @niketnilay. The files however are rolling log files so I think this wont be efficient.