Getting Data In

Folder Monitor - CSV not being indexed correctly.

danfinan
Explorer

Hi guys,

I am currently monitoring a folder (recursively) so that the files in the directory/sub-directories are indexed. These files will only ever be .CSV. The issue I have is that Splunk seems to only index the first three lines of the CSV files, the rest is ignored.

Here is my stanza:

[monitor://E:\Reports]
disabled = false
index = reports
recursive = true
host = host_name
sourcetype = csv

And here is the first 10 lines of my CSV... (the rest of the CSV file follows a similar format).

,,Business Name,,
Business Name - Calls Completed Last Week,,,,
"Generated by System Administrator  on : Dec 5, 2019 09:15 AM",,,,
Total records : 110,,,,
"Completed Time : From Nov 24, 2019 12:00 AM To Nov 30, 2019 11:59 PM",,,,
Request ID,Subject,Created Time,DueBy Time,Technician
"Nov 25, 2019",,,,
15624,Url log,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15625,Url Log - Daily Blocked Words List,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15629,Url Log - Daily Blocked Words List,"Nov 23, 2019 05:10 PM",Not Assigned,Tom
15630,Url log,"Nov 23, 2019 05:10 PM",Not Assigned,Tom

What else may I need to modify to get Splunk to index the data correctly? As i said, the first three lines of the CSV have been indexed - the rest ignored for some reason.

Thanks for your help!

Dan

0 Karma

dindu
Contributor

Hey,

The problem with your base data is that the header is appearing at line no.6.
I assume you are interested in the data from line No.7
First, define a custom sourcetype which parses the data as CSV.Settings-->SourceType-->New SourceType.
You have two options to do it.
Please try both and let us know.

Option 01

     [monitor://E:\Reports]
     disabled = false
     index = reports
     recursive = true
     host = host_name
     sourcetype = your_sourcetype

 Please modify the props.conf as below.

    [your_sourcetype]    
    DATETIME_CONFIG = CURRENT            
    HEADER_FIELD_LINE_NUMBER = 6             
    INDEXED_EXTRACTIONS = csv            
    SHOULD_LINEMERGE = false             
    category = Structured           
    disabled = false            
    pulldown_type = true

Option 02
Here you have to use field extraction to extract only the relevant lines and ignore the rest.
You could use SED commands or use a bash/shell script to extract the files and then index the file.
[Bash/Shell script will run and generate the file and store it a different location. Then monitor that location for the files]

    Please modify the props.conf as below

    [your_sourcetype]
    01_TRANSFORMS-null = null_queue
    02_TRANSFORMS-csv = transforms_csv


    Please modify the transforms.conf as below
    [null_queue]
    REGEX = your_regex
    DEST_KEY = queue
    FORMAT = nullQueue

    [transforms_csv]
    DELIMS = ","
    FIELDS = " Request ID","Subject","Created Time","DueBy Time","Technician"
0 Karma

oscar84x
Contributor

Have you had any of the files ingested successfully? I'm assuming you need all the information on the header (everything before line 6) indexed as well. If so, is the pattern for the header consistent?
Could you also share your props?

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...