Getting Data In

Can universal forwarder be set to always send entire csv file?

Tipmoose
Explorer

I am trying to incorporate company name information into sales/subscription charts for business leaders to use in presentations. Because our corporate networking structure is from hell, Ive had to do this by automating an SQL table export to CSV on the SQL Server and having the universal forwarder on the same server read the table export and forward it to Splunk.

Ive found the forwarder will resend the entire CSV in most cases except where rows are added to the end of the file (ie: new firms are added to the db). Further, when it sends just the new rows, the forwarder omits the header row in the CSV so now using multikv is a pain. Since I can never be sure what the format of the latest data will be (will there be headers? Will I have to append the new rows or can I just use the entire file) I am trying to see if I can just make the forwarder always send the entire CSV to the indexer regardless of what changes were made to the underlying data. The amount of data being sent is trivial, about 90-100 rows. The most we would see in this table would be a couple thousand.

Tags (2)
0 Karma
1 Solution

JSapienza
Contributor

Then you might try throwing some salt at it:

crcSalt = < SOURCE>


crcSalt = < string >
* Use this setting to force Splunk to consume files that have matching CRCs (cyclic redundancy checks). (Splunk only 
  performs CRC checks against the first few lines of a file. This behavior prevents Splunk from indexing the same 
  file twice, even though you may have renamed it -- as, for example, with rolling log files. However, because the 
  CRC is based on only the first few lines of the file, it is possible for legitimately different files to have 
  matching CRCs, particularly if they have identical headers.)
* If set, <string> is added to the CRC.
* If set to the literal string  < SOURCE > (including the angle brackets), the full directory path to the source file 
  is added to the CRC. This ensures that each file being monitored has a unique CRC.   When crcSalt is invoked, 
  it is usually set to < SOURCE >.
* Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed 
  after it has rolled. 
* Defaults to empty. 

View solution in original post

JSapienza
Contributor

Then you might try throwing some salt at it:

crcSalt = < SOURCE>


crcSalt = < string >
* Use this setting to force Splunk to consume files that have matching CRCs (cyclic redundancy checks). (Splunk only 
  performs CRC checks against the first few lines of a file. This behavior prevents Splunk from indexing the same 
  file twice, even though you may have renamed it -- as, for example, with rolling log files. However, because the 
  CRC is based on only the first few lines of the file, it is possible for legitimately different files to have 
  matching CRCs, particularly if they have identical headers.)
* If set, <string> is added to the CRC.
* If set to the literal string  < SOURCE > (including the angle brackets), the full directory path to the source file 
  is added to the CRC. This ensures that each file being monitored has a unique CRC.   When crcSalt is invoked, 
  it is usually set to < SOURCE >.
* Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed 
  after it has rolled. 
* Defaults to empty. 

Tipmoose
Explorer

Thanks! That might be what I need. I'll check with the admins to see if they'll go for it.

0 Karma

JSapienza
Contributor

In your inputs.conf stanza where you define your monitor make sure you have:

followTail = 0 or false

http://docs.splunk.com/Documentation/Splunk/5.0.3/Admin/Inputsconf

   followTail = [0|1]
    * WARNING: Use of followTail should be considered an advanced administrative action.
    * Treat this setting as an 'action'.  That is, bring splunk up with this
      setting enabled.  Wait enough time for splunk to identify the related files,
      then disable the setting and restart splunk without it.
    * DO NOT leave followTail enabled in an ongoing fashion.
    * Do not use for rolling log files, or files whose names or paths vary.
    * Can be used to force splunk to skip past all current data for a given stanza. 
      * In more detail: this is intended to mean that if you start up splunk with a
        stanza configured this way, all data in the file at the time it is first
        encountered will not be read.  Only data arriving after that first
        encounter time will be read.
      * This can be used to "skip over" data from old log files, or old portions of
        log files, to get started on current data right away.
    * If set to 1, monitoring begins at the end of the file (like tail -f).
    * If set to 0, Splunk will always start at the beginning of the file. 
    * Defaults to 0.
0 Karma

Tipmoose
Explorer

I went in and looked at the various inputs.conf files under the forwarder and the followTail entity doesn't exist in the file. Since documentation indicates followTail should default to 0, I'm guessing this is already the behavior. Yet, I'm still getting deltas sent to the indexer instead of the entire file.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...