Getting Data In

Why are my nearly identically-named log files not being indexed by Splunk?

ebrand
New Member

For our "ATA42_NETWORK" application we have indexed *.NCD files
These files are located in an “input directory” monitored by Splunk for automatic indexing

    /mnt/slv01/import/ATA42_NETWORK/NCD

When we perform a search request on our search head, we did not find all events for all *.NCD files

For instance on the indexer server in “input directory” /mnt/slv01/import/ATA42_NETWORK/NCD we have NCD file for A350_MSN_0005 flight 55 and 56

  [to102071@de0-lxsolp03 NCD]$ pwd
  /mnt/slv01/import/ATA42_NETWORK/NCD
  [to102071@de0-lxsolp03 NCD]$ ls -al A350_0005_F005*
  ...
  -r-xr-Sr-x 1 splunk splunk 138235 Apr  5  2016 A350_0005_F0055_2014_09_24_101025.ncd.gz
  -r-xr-Sr-x 1 splunk splunk 138235 Apr  5  2016 A350_0005_F0056_2014_09_25_105415.ncd.gz
  ...

When we perform the following search on the search head, we found the events for flight 55 but there is no events for flight 56

        index=aib_ata42_ncd source=/mnt/slv01/import/ATA42_NETWORK/NCD/A350_0005_F0055*
        index=aib_ata42_ncd source=/mnt/slv01/import/ATA42_NETWORK/NCD/A350_0005_F0056*

Both NCD files have been indexed the same day (Apr 5 2016) and have the same unix rights in the “input directory” : (-r-xr-Sr-x)
I have retrieved the two NCD files on my labtop, I have unzipped them and both NCD are strictly identical !!! (same md5sum)
➤ md5sum A350_0005_F0055_2014_09_24_101025.ncd
8edfdbfd2f2294512a84aed17f58c299 A350_0005_F0055_2014_09_24_101025.ncd
➤ md5sum A350_0005_F0056_2014_09_25_105415.ncd
8edfdbfd2f2294512a84aed17f58c299 A350_0005_F0056_2014_09_25_105415.ncd

My question:
It seems that we have an indexing issue.
Do you have an idea of the possible root cause of the problem?

0 Karma

dbcase
Motivator

Try putting the below exactly as typed in your inputs.conf

crcSalt = <SOURCE>

From the splunk docs

crcSalt = <string>
* Use this setting to force the input to consume files that have matching CRCs
  (cyclic redundancy checks).
    * (The input only performs CRC checks against, by default, the first 256
      bytes of a file. This behavior prevents the input from indexing the same
      file twice, even though you may have renamed it -- as, for example, with
      rolling log files. However, because the CRC is based on only the first
      few lines of the file, it is possible for legitimately different files
      to have matching CRCs, particularly if they have identical headers.)
* If set, <string> is added to the CRC.
* If set to the literal string <SOURCE> (including the angle brackets), the
  full directory path to the source file is added to the CRC. This ensures
  that each file being monitored has a unique CRC.   When crcSalt is invoked,
  it is usually set to <SOURCE>.
* Be cautious about using this setting with rolling log files; it could lead
  to the log file being re-indexed after it has rolled.
* In many situations, initCrcLength can be used to achieve the same goals.
* Defaults to empty.
0 Karma

dbcase
Motivator

Try this

    crcSalt = <SOURCE>
0 Karma

ebrand
New Member

Sorry, it was a typo error.

We have this in our "Inputs.conf" file

[monitor:///mnt/slv01/import/ATA42_NETWORK/NCD/.ncd(.gz)?]
disabled = 0
index = aib_ata42_ncd
sourcetype = ATA42_NCD
crcSalt =

0 Karma

lguinn2
Legend

For each file monitored, Splunk looks at the first part of the file. If it matches the initial part of another file, Splunk assumes that it has seen this file before and will not index it again. So if your files are identical, so Splunk only indexes the data once.

If your question is "why are the files identical," that is not a Splunk prroblem.

If you want to index duplicate files, you can tell Splunk to do that for a particular input. In inputs.conf, add

crcSalt=<SOURCE>

You should read the documentation here, to be sure that this will not cause other problems for you. Look in the section "Monitor syntax and examples," it's not long.

ebrand
New Member

Thanks Iguinn,

Sorry I forgot to precise that I am already using the "crcSalt = " option in my "Inputs.conf" file :

[monitor:///mnt/slv01/import/ATA42_NETWORK/NCD/*.ncd(.gz)?]
disabled = 0
index = aib_ata42_ncd
sourcetype = ATA42_NCD
crcSalt =

The indexation issue occurs on a Splunk indexer server with version 6.2.0 (build 237341). This server is our PRODUCTION server

Now I have tested this issue on a Splunk Light server with version 6.5.0 (build 59c8927def0f)
This indexation issue does not occur with version 6.5.0 Light

Could you confirm this is a bug in version 6.2.0 (build 237341) ?
If yes, is there a patch ?

0 Karma

lguinn2
Legend

Sorry, you should confirm bugs and status with Support. That is not something that I personally can do, although some folks here on the forum can.

And it looks like crcSalt is set to nothing in your inputs.conf; I assume that is a typo

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...