I've defined a sourcetype for CSV data with a fixed header
and data that looks like:
Date,Color,Data1,Data2
2015-01-30 10:11:12,Red,1.1,1.01
2015-01-30 10:11:12,Green,0,0
2015-01-30 10:11:13,Red,2.2,2.02
2015-01-30 10:11:14,Red,3.3,3.03
...
so the header contains the field names of the sourcetype.
What is the best way to search, using something like
this pseudo-SQL query:
SELECT Color1 WHERE Color=Red
Splunk looks like it can do much more than this but
I'd like to start out simple. I tried queries that I thought
included the clause
...WHERE Color=Red
in Splunk-speak but I couldn't figure out how to reference
the pre-defined columns, because there's no sense looking
for 'Red' in the Date or Data fields.
Thank you.
For your sourcetype, you should be setting you delimiter in props for CSV..
props.conf
[mycsvsourcetype]
HEADER_FIELD_LINE_NUMBER=1
TIMESTAMP_FIELDS=date
FIELD_DELIMITER=,
Your sources need to be consistent in terms of the header existing.
Once indexed correctly, you can search for fields as desired:
sourcetype=mycsvsourcetype Color=red | table Color, Data1,Data2
That will return all events with the field named Color, that has the value of red. Do note, field names are case sensitive. So if your header is "Color" and "Data1", you have to use those fields names.
Thanks for the response. I tried what you mentioned with no luck.
$SPLUNK_HOME/etc/apps/MyApp/local/props.conf
$SPLUNK_HOME/etc/apps/search/local/props.conf
$SPLUNK_HOME/etc/apps/local/props.conf
with a stanza that looks something like:
....
[CST]
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
PREAMBLE_REGEX = ^Date
SHOULD_LINEMERGE = false
category = Custom
description = my log files
disabled = false
pulldown_type = true
FIELD_NAMES = Date,Color,Data1,Data2
TIMESTAMP_FIELDS = Date
TIME_FORMAT = %Y/%m/%d %H:%M:%S
FIELD_DELIMITER=,
HEADER_FIELD_DELIMITER=,
...
Notice I did not include
HEADER_FIELD_LINE_NUMBER=1
because the data will always be coming in - via file or TCP - without
the header.
To restate the situation, it looks like the data coming in on the TCP
port is not being parsed using the custom CST sourcetype I made,
even though I referenced that sourcetype when I created the TCP port input.
CSV's are not TCP inputs. CSV's are flat files that are read into Splunk from the file system and parsed differently because the header applies to the whole file, where as TCP inputs are sent over the network and processed per event and headers are not maintained.
So this changes the whole process. If you try to read in a CSV file, the sourcetype will be different then a TCP input due to the nature of how the file and network flow look.
I recommend creating a sourcetype for the CSV on disk first, and validate that based on the above recommendations. Once that is done and validated, move on to the network input based sourcetype. For that, share how the events look coming over the wire and we can help more.
Ahhh. Makes sense. OK, the sourcetype I created for the
flat file works. The TCP input process essentially is that a
remote host periodically sends lines from a CSV file that is
being tailed to the TCP port. So what is being sent to
the TCP port is a string that looks like it was taken from a CSV file, because the original source is in fact a CSV file. The headers are still fixed, because all lines have the same data layout. So I assumed the sourcetype for the CSV file would work for the TCP port as well.
As a special case of this, I would also like to be able
to load an entire CSV file into Splunk using the TCP port,
rather than reading a local file found on the Splunk server,
and have it automatically parsed using the CST sourcetype
I created.
Thanks for the response. The query works for me some of the time. The situation I seem
to have is that my custom sourcetype, call it CST, works on header-less data that
is read in from a file, but is ignored if the exact same header-less data is read in through
a TCP port. I could understand if it worked for neither or both, but I can't understand
how it can work on only one. By 'work', I mean the column headers, such as 'Color',
appear in the "Interesting Fields" seen from the GUI's search page and this query
returns results:
sourcetype=CST Color=Red
And I assume it doesn't work because the literal string 'Color=Red' does not exist in the data.
Could you please post your search?
sourcetype=csv Color=Red | table Date,color,data2,data2 would give you results where color field is red.
Post what you have and the desired output.
Thanks,
Raghav