Getting Data In

Folder Monitor - CSV not being indexed correctly.

danfinan
Explorer

Hi guys,

I am currently monitoring a folder (recursively) so that the files in the directory/sub-directories are indexed. These files will only ever be .CSV. The issue I have is that Splunk seems to only index the first three lines of the CSV files, the rest is ignored.

Here is my stanza:

[monitor://E:\Reports]
disabled = false
index = reports
recursive = true
host = host_name
sourcetype = csv

And here is the first 10 lines of my CSV... (the rest of the CSV file follows a similar format).

,,Business Name,,
Business Name - Calls Completed Last Week,,,,
"Generated by System Administrator  on : Dec 5, 2019 09:15 AM",,,,
Total records : 110,,,,
"Completed Time : From Nov 24, 2019 12:00 AM To Nov 30, 2019 11:59 PM",,,,
Request ID,Subject,Created Time,DueBy Time,Technician
"Nov 25, 2019",,,,
15624,Url log,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15625,Url Log - Daily Blocked Words List,"Nov 22, 2019 05:02 PM",Not Assigned,Tom
15629,Url Log - Daily Blocked Words List,"Nov 23, 2019 05:10 PM",Not Assigned,Tom
15630,Url log,"Nov 23, 2019 05:10 PM",Not Assigned,Tom

What else may I need to modify to get Splunk to index the data correctly? As i said, the first three lines of the CSV have been indexed - the rest ignored for some reason.

Thanks for your help!

Dan

0 Karma

dindu
Contributor

Hey,

The problem with your base data is that the header is appearing at line no.6.
I assume you are interested in the data from line No.7
First, define a custom sourcetype which parses the data as CSV.Settings-->SourceType-->New SourceType.
You have two options to do it.
Please try both and let us know.

Option 01

     [monitor://E:\Reports]
     disabled = false
     index = reports
     recursive = true
     host = host_name
     sourcetype = your_sourcetype

 Please modify the props.conf as below.

    [your_sourcetype]    
    DATETIME_CONFIG = CURRENT            
    HEADER_FIELD_LINE_NUMBER = 6             
    INDEXED_EXTRACTIONS = csv            
    SHOULD_LINEMERGE = false             
    category = Structured           
    disabled = false            
    pulldown_type = true

Option 02
Here you have to use field extraction to extract only the relevant lines and ignore the rest.
You could use SED commands or use a bash/shell script to extract the files and then index the file.
[Bash/Shell script will run and generate the file and store it a different location. Then monitor that location for the files]

    Please modify the props.conf as below

    [your_sourcetype]
    01_TRANSFORMS-null = null_queue
    02_TRANSFORMS-csv = transforms_csv


    Please modify the transforms.conf as below
    [null_queue]
    REGEX = your_regex
    DEST_KEY = queue
    FORMAT = nullQueue

    [transforms_csv]
    DELIMS = ","
    FIELDS = " Request ID","Subject","Created Time","DueBy Time","Technician"
0 Karma

oscar84x
Contributor

Have you had any of the files ingested successfully? I'm assuming you need all the information on the header (everything before line 6) indexed as well. If so, is the pattern for the header consistent?
Could you also share your props?

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...