Hello,
I am trying to index a csv log file that looks like this:
Description,NumJobWaitEvents,ReturnCode,RunEnd,RunStart,ScheduledStartTime,Status
Job.Description,Job.NumJobWaitEvents,Job.ReturnCode,Job.RunEnd,Job.RunStart,Job.ScheduledStartTime,Job.Status
String,Integer,Integer,DateTime,DateTime,DateTime,enum.JobStatus
Auto Start,0,null,"2017/03/05 06:03:39,441","2017/03/05 06:01:39,269","2017/03/05 06:01:39,065",Completed
Auto Start,0,null,"2017/03/05 06:09:04,493","2017/03/05 06:06:23,915","2017/03/05 06:06:23,743",Completed
AG43_542_TINA_CODE_AGB - Checking,1,null,"2017/03/05 06:32:18,908","2017/03/05 06:23:15,148","2017/03/05 06:23:14,822",Completed
DATA SANITY CHECK,0,null,"2017/03/05 09:02:23,997","2017/03/05 09:00:44,073","2017/03/05 09:00:42,959",Completed
The first line always contains the header, the second and third lines always contain object and type information, and the log data always starts from the fourth line.
When I index the file as it is, it only indexes the first two lines even though there are thousands. My question is: how can I skip the second and third lines so I can index the actual log data?
Thank you and best regards,
Andrew
Try this:
[MY_SOURCETYPE]
FIELD_DELIMITER = ,
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = CSV
PREAMBLE_REGEX = (^|[\r\n])(Job\.Description[^\r\n]+|String[^\r\n]+)
TIMESTAMP_FIELDS = RunStart
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
Also, IMHO, events that are "durationful" (i.e. contain start
and end
time details) should always use the end
time as the timestamp
. For just one reason, think about what your timechart would look like if your system crashed and all events ended at the same time.
Thanks for the suggestion, but unfortunately it doesn't work. I think I see what you're getting at, though: you're trying to create one expression that covers both lines, right? I'm not too proficient with regexs.
I'll keep playing with it, thanks!
Andrew
Yes, and make it flexible enough to work if presented the entire event or just a single line. That really should have done it.
Just to make sure that I'm following the right procedure I'm going to list out the steps I've followed:
Edit props.conf
located in SPLUNKHOME\etc\apps\MY_APP\local to contain
[MY_SOURCETYPE]
FIELD_DELIMITER = ,
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = csv
PREAMBLE_REGEX = (^|[\r\n])(Job.Description[^\r\n]+|String[^\r\n]+)
TIMESTAMP_FIELDS = RunStart
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
Restart splunkd via cmd: net stop splunkd/net start splunkd
Once up, log into Splunk (6.5.2 btw) and enter my app
From the Settings
menu, select Add Data
Select upload
Select the csv that contains the data above
Select Next
From the Source type list, select MY_SOURCETYPE
At this point, the first two lines of the event list are as follows
If the regex works as planned, would I see those two lines at that point?
Best regards,
Andrew
Hi Andrew,
many greetings. We were colleagues and shared Splunk informations a lot. I have very similar problem as you have described. Did you solve your problem in the mean time?
I wish you all the best.
Michal Spisiak
If everything is working, you should not see those lines. HOWEVER, I have never used the Add Data
wizard with INDEXED_EXTRACTIONS
before.
Check out: http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/Propsconf
The section: Structured Data Header Extraction and configuration
PREAMBLE_REGEX =
* Some files contain preamble lines. This attribute specifies a regular
expression which allows Splunk to ignore these preamble lines, based on
the pattern specified.
Thanks, I'll take a look. One doubt: will this allow me to read the first line as the headers and only ignore the second and third lines?
Yes, exactly.
Thanks @woodcock
I've been experimenting but I can't get it to work. I've added PREAMBLE_REGEX = ^Job\.Description.*|String.*
(which works on https://regex101.com/) and HEADER_FIELD_LINE_NUMBER = 1
but it doesn't seem to be working. I am performing a manual import, selecting MY_SOURCETYPE
which is defined in my props.conf
as follows:
[MY_SOURCETYPE]
AUTO_KV_JSON = 1
DATETIME_CONFIG =
FIELD_DELIMITER = ,
HEADER_FIELD_LINE_NUMBER = 1
INDEXED_EXTRACTIONS = csv
KV_MODE = none
NO_BINARY_CHECK = true
PREAMBLE_REGEX = ^Job\.Description.*|String.*
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = RunStart
category = Structured
description = Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true
Are there any other configurations that I should be aware of?
Best regards,
Andrew