I've a csv file containing thousands of events, each event is only single line with date time stamp and several other fields. I use the Web Data inputs > Files & Directories to select the csv and preview show that splunk can recognize the correct date time stamp on each line. However, after the csv is imported, search results show that the date time stamp of all events become the modified date of the csv file. How to fix this?
Not that strange. TIME_FORMAT
and INDEXED_EXTRACTIONS
aren't really compatible.
I found that the problem can be solved if I use this props.conf, strange?
NO_BINARY_CHECK=1
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y%m%d %H%M%S
pulldown_type=1
For the small sample you have provided, the sourcetype settings are correct. So you might want to edit your answer, and provide a richer data sample so we can see more of the file.
Check out this list: How Timestamp Assignment Works
Splunk looks for a time or date in the event itself using an explicit TIME_FORMAT, if provided. You configure the TIME_FORMAT attribute in props.conf.
If no TIME_FORMAT was configured for the data, Splunk Enterprise attempts to automatically identify a time or date in the event itself. It uses the source type of the event (which includes TIME_FORMAT information) to try to find the timestamp.
If an event doesn't have a time or date, Splunk Enterprise uses the timestamp from the most recent previous event of the same source.
If no events in a source have a date, Splunk Enterprise tries to find one in the source name or file name. (This requires that the events have a time, even though they don't have a date.)
For file sources, if no date can be identified in the file name, Splunk Enterprise uses the file's modification time.
As a last resort, Splunk Enterprise sets the timestamp to the current system time when indexing each event.
Note: Splunk Enterprise can only extract dates from a source, not times. If you need to extract a time from a source, use a transform.
Splunk will resort to using the timestamp in the source only if you don't give the TIME_FORMAT or if the one you have given seems incorrect. For example, if you have the majority of events that don't match the TIME_FORMAT Splunk will think "oh, that can't be right" and it will ignore it and go to the next step on the list.
Note also, that if you have a header on these files, you can tell Splunk in which field(s) it will find the timestamp, regardless of the format.
So It would be my assumption at this point. that the file isn't as consistent as you expect. What I would do, is use Splunk to show me that first field.
Since the data is in, try this:
sourcetype=YOURSOURCETYPE |rex "(?<mytimestamp>^[^\;]+);"| stats count by mytimestamp
Thanks rsennett_splunk. So do you mean because the file isn't consistent, so it work in preview which only looks at a few lines and don't work after the whole file is imported?
I'll try your command to see what's happening.....
Have you tried using the TIMESTAMP_FIELDs
setting as per http://docs.splunk.com/Documentation/Splunk/6.1.2/Data/Extractfieldsfromfileheadersatindextime?
Thanks Martin, here is some sample date and the props.conf.
20140101 061101; 28; 29
20140101 061109; 29; 30
20140101 061204; 28; 30
20140101 071204; 29; 30
20140101 071204; 29; 30
20140101 090634; 30; 31
20140101 101107; 31; 32
FIELD_DELIMITER=;
HEADER_FIELD_DELIMITER=;
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y%m%d %H%M%S
TZ=Asia/Singapore
KV_MODE=none
pulldown_type=1
Do post some sample data and the props.conf settings for that sourcetype.