Getting Data In

best practices for search against CSV data with a fixed header?

drmark
New Member

I've defined a sourcetype for CSV data with a fixed header
and data that looks like:

Date,Color,Data1,Data2
2015-01-30 10:11:12,Red,1.1,1.01
2015-01-30 10:11:12,Green,0,0
2015-01-30 10:11:13,Red,2.2,2.02
2015-01-30 10:11:14,Red,3.3,3.03
...

so the header contains the field names of the sourcetype.
What is the best way to search, using something like
this pseudo-SQL query:

SELECT Color1 WHERE Color=Red

Splunk looks like it can do much more than this but
I'd like to start out simple. I tried queries that I thought
included the clause

...WHERE Color=Red

in Splunk-speak but I couldn't figure out how to reference
the pre-defined columns, because there's no sense looking
for 'Red' in the Date or Data fields.

Thank you.

Tags (2)
0 Karma

esix_splunk
Splunk Employee
Splunk Employee

For your sourcetype, you should be setting you delimiter in props for CSV..

props.conf
[mycsvsourcetype]
HEADER_FIELD_LINE_NUMBER=1
TIMESTAMP_FIELDS=date
FIELD_DELIMITER=,

Your sources need to be consistent in terms of the header existing.

Once indexed correctly, you can search for fields as desired:

sourcetype=mycsvsourcetype Color=red | table Color, Data1,Data2

That will return all events with the field named Color, that has the value of red. Do note, field names are case sensitive. So if your header is "Color" and "Data1", you have to use those fields names.

0 Karma

drmark
New Member

Thanks for the response. I tried what you mentioned with no luck.

  • when you say '... props for CSV..' which file do you mean? In my installation I have eight different 'props.conf' files. From your instructions I updated these files

$SPLUNK_HOME/etc/apps/MyApp/local/props.conf
$SPLUNK_HOME/etc/apps/search/local/props.conf
$SPLUNK_HOME/etc/apps/local/props.conf

with a stanza that looks something like:

....
[CST]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
PREAMBLE_REGEX = ^Date
SHOULD_LINEMERGE = false
category = Custom
description = my log files
disabled = false
pulldown_type = true
FIELD_NAMES = Date,Color,Data1,Data2
TIMESTAMP_FIELDS = Date
TIME_FORMAT = %Y/%m/%d %H:%M:%S
FIELD_DELIMITER=,
HEADER_FIELD_DELIMITER=,
...

Notice I did not include

HEADER_FIELD_LINE_NUMBER=1

because the data will always be coming in - via file or TCP - without
the header.

0 Karma

drmark
New Member

To restate the situation, it looks like the data coming in on the TCP
port is not being parsed using the custom CST sourcetype I made,
even though I referenced that sourcetype when I created the TCP port input.

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

CSV's are not TCP inputs. CSV's are flat files that are read into Splunk from the file system and parsed differently because the header applies to the whole file, where as TCP inputs are sent over the network and processed per event and headers are not maintained.

So this changes the whole process. If you try to read in a CSV file, the sourcetype will be different then a TCP input due to the nature of how the file and network flow look.

I recommend creating a sourcetype for the CSV on disk first, and validate that based on the above recommendations. Once that is done and validated, move on to the network input based sourcetype. For that, share how the events look coming over the wire and we can help more.

0 Karma

drmark
New Member

Ahhh. Makes sense. OK, the sourcetype I created for the
flat file works. The TCP input process essentially is that a
remote host periodically sends lines from a CSV file that is
being tailed to the TCP port. So what is being sent to
the TCP port is a string that looks like it was taken from a CSV file, because the original source is in fact a CSV file. The headers are still fixed, because all lines have the same data layout. So I assumed the sourcetype for the CSV file would work for the TCP port as well.

0 Karma

drmark
New Member

As a special case of this, I would also like to be able
to load an entire CSV file into Splunk using the TCP port,
rather than reading a local file found on the Splunk server,
and have it automatically parsed using the CST sourcetype
I created.

0 Karma

drmark
New Member

Thanks for the response. The query works for me some of the time. The situation I seem
to have is that my custom sourcetype, call it CST, works on header-less data that
is read in from a file, but is ignored if the exact same header-less data is read in through
a TCP port. I could understand if it worked for neither or both, but I can't understand
how it can work on only one. By 'work', I mean the column headers, such as 'Color',
appear in the "Interesting Fields" seen from the GUI's search page and this query
returns results:

sourcetype=CST Color=Red

And I assume it doesn't work because the literal string 'Color=Red' does not exist in the data.

0 Karma

Raghav2384
Motivator

Could you please post your search?

sourcetype=csv Color=Red | table Date,color,data2,data2 would give you results where color field is red.

Post what you have and the desired output.

Thanks,
Raghav

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...