Splunk Enterprise Security

How to remove unwanted data while indexing a CSV file?

pbankar
Path Finder

I have a CSV file that has some data at the start of the file and in end.
Like:

----BEGIN_RESPONSE_BODY_CSV
"Date","Action","Module","Details","User Name","User Role","User IP"
"2019-12-30T05:41:34Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:55Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:40:12Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
"2019-12-30T05:39:53Z","request","auth","API: active_user","xxxx","Manager","10.10.10.10"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
"CODE","TEXT","URL"
"1980","10 record limit exceeded. Use URL to get next batch of results.","/api/?action=list&truncation_limit=10&id_max=1111111"
----END_RESPONSE_FOOTER_CSV

I need to index the CSV data only. Need to remove the first line

 ----BEGIN_RESPONSE_BODY_CSV

the lines between

 ----END_RESPONSE_BODY_CSV
 ----END_RESPONSE_FOOTER_CSV

Please suggest what should I do in the props.conf or any conf file, to remove the unwanted data.
My props.conf:

[ csv ]
SHOULD_LINEMERGE=false
LINE_BREAKER=([\r\n]+)
NO_BINARY_CHECK=true
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TZ=UTC
TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
0 Karma

jkat54
SplunkTrust
SplunkTrust

In your props.conf:

[yoursourcetype]
SEDCMD-beginAndend = s/^-{4}.*//g

above removes lines that start with ----

But then you'll have two csv stuck together with both header lines too. Do you need the data in the footer responses too?

You won't be able to use indexed_extractions. You'll need a to create a report/transforms and transforms.conf to extract the fields as such.

0 Karma

to4kawa
Ultra Champion

props.conf:

    [ csv ]
     SHOULD_LINEMERGE=false
     LINE_BREAKER=([\r\n]+)
     NO_BINARY_CHECK=true
     CHARSET=UTF-8
     INDEXED_EXTRACTIONS=csv
     KV_MODE=none
     category=Structured
     description=Comma-separated value format. Set header and other settings in "Delimited Settings"
     disabled=false
     pulldown_type=true
     TZ=UTC
     TIME_FORMAT=%Y-%m-%dT%H:%M:%SZ
     TRANSFORMS-csv=response_body_csv, response_footer_csv  

trensforms.conf:

    [response_body_csv]
    REGEX=\"(?<Date>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\",\"(?<Action>\w+)\",\"(?<Module>\w+)\",\"(?<Details>\w+: \w+)\",\"(?<User_Name>\w+)\",\"(?<User_Role>\w+)\",\"(?<User_IP>[\w.:]+)\"
    [response_footer_csv]
    REGEX=\"(?<CODE>\d{4})\",\"(?<TEXT>.+)\",\"(?<URL>.+)\"

If it is a field extraction at the time of search, it looks like this.

0 Karma

to4kawa
Ultra Champion
| makeresults
| eval _raw="----BEGIN_RESPONSE_BODY_CSV
\"Date\",\"Action\",\"Module\",\"Details\",\"User Name\",\"User Role\",\"User IP\"
\"2019-12-30T05:41:34Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:55Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:40:12Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
\"2019-12-30T05:39:53Z\",\"request\",\"auth\",\"API: active_user\",\"xxxx\",\"Manager\",\"10.10.10.10\"
----END_RESPONSE_BODY_CSV
----BEGIN_RESPONSE_FOOTER_CSV
WARNING
\"CODE\",\"TEXT\",\"URL\"
\"1980\",\"10 record limit exceeded. Use URL to get next batch of results.\",\"/api/?action=list&truncation_limit=10&id_max=1111111\"
----END_RESPONSE_FOOTER_CSV"
| rex mode=sed "s/(?sm)^(\-.+?|WARNING)$/#/g"
| rex mode=sed "s/\"//g"
| makemv delim="#" _raw
| stats count by _raw
| multikv forceheader=1
| where match(_raw,".+")
| fields - *count _raw
| rex field=Date mode=sed "s/Z$/-0000/"
| eval Date=strptime(Date,"%FT%T%z")
| fieldformat Date=strftime(Date,"%F %T")

This is sample as your hope is the setting in props.conf.
multikv don't work properly, so I removed ".
This is also modified to recognize UTC.

0 Karma

pbankar
Path Finder

to4kawa, thanks for the input. I'm looking for a fix while indexing the data in Splunk using configuration file. Is it possible?

0 Karma

to4kawa
Ultra Champion

I'm not an expert, so I don't know.
But I think you may use this regular expressions.

Configure event line breaking

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...