your settings

calvintkng · ‎08-02-2014

I've a csv file containing thousands of events, each event is only single line with date time stamp and several other fields. I use the Web Data inputs > Files & Directories to select the csv and preview show that splunk can recognize the correct date time stamp on each line. However, after the csv is imported, search results show that the date time stamp of all events become the modified date of the csv file. How to fix this?

martin_mueller · ‎08-04-2014

Not that strange. TIME_FORMAT and INDEXED_EXTRACTIONS aren't really compatible.

calvintkng · ‎08-04-2014

I found that the problem can be solved if I use this props.conf, strange?

your settings

NO_BINARY_CHECK=1

set by detected source type

INDEXED_EXTRACTIONS=csv

KV_MODE=none

SHOULD_LINEMERGE=false

TIME_FORMAT=%Y%m%d %H%M%S

pulldown_type=1

rsennett_splunk · ‎08-03-2014

For the small sample you have provided, the sourcetype settings are correct. So you might want to edit your answer, and provide a richer data sample so we can see more of the file.

Check out this list: How Timestamp Assignment Works

Splunk looks for a time or date in the event itself using an explicit TIME_FORMAT, if provided. You configure the TIME_FORMAT attribute in props.conf.
If no TIME_FORMAT was configured for the data, Splunk Enterprise attempts to automatically identify a time or date in the event itself. It uses the source type of the event (which includes TIME_FORMAT information) to try to find the timestamp.
If an event doesn't have a time or date, Splunk Enterprise uses the timestamp from the most recent previous event of the same source.
If no events in a source have a date, Splunk Enterprise tries to find one in the source name or file name. (This requires that the events have a time, even though they don't have a date.)
For file sources, if no date can be identified in the file name, Splunk Enterprise uses the file's modification time.
As a last resort, Splunk Enterprise sets the timestamp to the current system time when indexing each event.

Note: Splunk Enterprise can only extract dates from a source, not times. If you need to extract a time from a source, use a transform.

Splunk will resort to using the timestamp in the source only if you don't give the TIME_FORMAT or if the one you have given seems incorrect. For example, if you have the majority of events that don't match the TIME_FORMAT Splunk will think "oh, that can't be right" and it will ignore it and go to the next step on the list.

Note also, that if you have a header on these files, you can tell Splunk in which field(s) it will find the timestamp, regardless of the format.

So It would be my assumption at this point. that the file isn't as consistent as you expect. What I would do, is use Splunk to show me that first field.

Since the data is in, try this:
sourcetype=YOURSOURCETYPE |rex "(?<mytimestamp>^[^\;]+);"| stats count by mytimestamp

this will help you see what's in there that is convincing Splunk that TIME_FORMAT doesn't match. Then you will have another puzzle. 🙂 If so, I suggest you reform your question and create a new entry in "answers".

With Splunk... the answer is always "YES!". It just might require more regex than you're prepared for!

calvintkng · ‎08-03-2014

Thanks rsennett_splunk. So do you mean because the file isn't consistent, so it work in preview which only looks at a few lines and don't work after the whole file is imported?

I'll try your command to see what's happening.....

martin_mueller · ‎08-03-2014

Have you tried using the TIMESTAMP_FIELDs setting as per http://docs.splunk.com/Documentation/Splunk/6.1.2/Data/Extractfieldsfromfileheadersatindextime?

calvintkng · ‎08-03-2014

Thanks Martin, here is some sample date and the props.conf.

20140101 061101; 28; 29
20140101 061109; 29; 30
20140101 061204; 28; 30
20140101 071204; 29; 30
20140101 071204; 29; 30
20140101 090634; 30; 31
20140101 101107; 31; 32

your settings

FIELD_DELIMITER=;
HEADER_FIELD_DELIMITER=;
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TIME_FORMAT=%Y%m%d %H%M%S
TZ=Asia/Singapore

set by detected source type

KV_MODE=none
pulldown_type=1

martin_mueller · ‎08-02-2014

Do post some sample data and the props.conf settings for that sourcetype.

Wrong Timestamp of CSV

your settings

set by detected source type

your settings

set by detected source type

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk