Splunk Search

How to skip header in CSV files before indexing?

sander_vandamme
Path Finder

My input files are in the following format (CSV):

Icon Statistics

Time;26.10.2017 00:00 - 27.10.2017 04:40
Service;Servicename
Statistic;Report_servicename

Date;Time;IncomingRequest;InternalSystemDBError;InternalSystemDataError;InternalSystemErrorOther;OK;SDUPTimeout;SDUPError;InvalidIncomingRequest;counter8;counter9;counter10;counter11;counter12;counter13;counter14;counter15;counter16;counter17;counter18;counter19
26.10.2017;00:00;4;0;0;0;4;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:10;2;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
26.10.2017;00:20;5;0;0;0;5;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
Total;;1,234;0;0;0;1,224;0;10;0;0;0;0;0;0;0;0;0;0;0;0;0

Before indexing these files, the "header" should be removed.
I configured the Splunk Universal Forwarder to monitor these files in the following way:

[monitor:///opt/ect/data/sdp/mail/statistics/*SDUP*.csv]
index=csdp_prod_stats
source=statistics
sourcetype=csv
crcSalt = <SOURCE>
ignoreOlderThan=14d

On the main Splunk instance, I configured the props.conf:

[csv]
TRANSFORMS-eliminate_header = eliminate_header
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITER = ;
TIMESTAMP_FIELDS = Date,Time
HEADER_FIELD_LINE_NUMBER = 7

And transforms.conf as following:

[eliminate_header]
REGEX = ^(?:Icon|Time|Service|Statistic|Total)
DEST_KEY = queue
FORMAT = nullQueue

When I check the search in Splunk, it seems like the remove of the header is not working. The complete file is being indexed. What am I doing wrong?

Also I want to use the column names in the CSV as field names in Splunk from the line I did not remove from the CSV file. Is this the correct way of specifying this automatic extraction of fields in Spunk? ("HEADER_FIELD_LINE_NUMBER = 7" as seen above in props.conf)

Thank you in advance!

0 Karma
1 Solution

mattymo
Splunk Employee
Splunk Employee

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

alt text

- MattyMo

View solution in original post

mattymo
Splunk Employee
Splunk Employee

Hey Sander!

you need to make sure you put the props/transforms on the forwarder when dealing with structured data:

"If you want to forward fields that you extract from structured data files to another Splunk instance, you must configure the props.conf settings that define the field extractions on the forwarder that sends the data."

https://docs.splunk.com/Documentation/Splunk/7.0.0/Data/Extractfieldsfromfileswithstructureddata#Fie...

This props worked for me, you should just pick the right timezone (TZ) value for this data, and perhaps just dump the Total line..by providing the header line number , I believe you remove the need for props/transforms to dump the header as we do it automagically I believe:

[ sander_csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
HEADER_FIELD_LINE_NUMBER=7
FIELD_DELIMITER=;
TZ=UTC
TIMESTAMP_FIELDS=Date,Time

alt text

- MattyMo

DUThibault
Contributor

Please clarify: when you say "you need to make sure you put the props/transforms on the forwarder", do you mean a forwarding Splunk instance, or do you mean a Splunk Universal Forwarder?

0 Karma

mattymo
Splunk Employee
Splunk Employee

speaking specifically about indexed_extractions, it would be any forwarding instance.

- MattyMo
0 Karma

DUThibault
Contributor

So I should copy [Splunk Instance]/opt/splunk/etc/apps/search/local/props.conf and transforms.conf to [Splunk Universal Forwarder]/opt/splunkforwarder/etc/apps/_server_app_<server class>/local/ , correct?

0 Karma

mattymo
Splunk Employee
Splunk Employee

hard to say, not sure what you are trying to do. maybe start a new answers post and link me and I'll help you there, or catch me on slack (splk.it/splunk - my username is @mattymo)

- MattyMo
0 Karma

DUThibault
Contributor

How do I "link you"? I don't see anything resembling that on my original question's page.

0 Karma

mattymo
Splunk Employee
Splunk Employee

just post the link here

- MattyMo
0 Karma

DUThibault
Contributor
0 Karma

sander_vandamme
Path Finder

Thank you! This one is working for me. Your proposed props.conf in Combination with the transforms.conf the "Total" line is also skipped from indexing.

0 Karma

mattymo
Splunk Employee
Splunk Employee

sweet, what did it? pushing the props/transforms the forwarder?

- MattyMo
0 Karma

sander_vandamme
Path Finder

Yes indeed, moved both files to the forwarder and it started to work flawlessly!
Thanks once more!

0 Karma

koshyk
Super Champion

is the above example data, a single event or whole contents of a file? Just checking this because if "Icon Statistics" occur again the same file, it might need line breaker and line merge false options

0 Karma

sander_vandamme
Path Finder

The data above is an example of such file. In the monitored location (/opt/ect/data/sdp/mail/statistics/SDUP.csv) the same kind of file is being exported every 10 minutes (with a different name of course). The header I am speaking off that needs to be skipped is the same structure in every csv file.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...