I need to ingest a file that contains the year, month, and day in the filename, while also containing the exact time of the event (since midnight) within the day inside the file, in a field called nanos. I'm trying to figure out the best way to get a combination of these two pieces of information to get the correct _time field.
Sample file:
Filename: test_csv_parsing_20161011.txt
sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
29585650,62733712723932,'*',zzzz,zzzz,54,''
Given the following:
nanos => seconds => hours:mins:secs.nanos
62733712723932 => 62733.712723932 => 17:25:33.712723932
Expected results:
_time = 2016/10/11 17:25:33.712723932
I have already tried the following:
TIME_FORMAT=%s%9N
TIMESTAMP_FIELDS=nanos
Based on what it says in the configure timestamps Splunk docs, it should take the date from the filename if it can only find the time inside the event: "4. If no events in a source have a date, Splunk software tries to find a date in the source name or file name. Time of day is not identified in filenames. (This requires that the events have a time, even though they don't have a date.)"
But I get the following warnings in the "adddata/datapreview" dashboard: "The TIME_FORMAT specified is matching timestamps (Mon Oct 17 07:07:52 2168) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE.
" & "Failed to parse timestamp. Defaulting to file modtime.
"
It's probably because it's giving the %s priority over the %9N in the TIME_FORMAT. If it first captured the last 9 digits as nanoseconds, and then used the rest as seconds this should work.
Can anybody provide guidance?
I did something similar but it was a little bit tricky.
In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:
EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source
Then you can concat the fields and send to another index with "collect" command with the new _time field:
index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex
Maybe this is not the cleanest way to do it but I'm sure it can help you.
You'll need to customize datetime.xml to parse out the date from filename
https://answers.splunk.com/answers/172535/extract-date-time-from-source-path-with-custom-dat.html
Thanks @tchen_splunk - I'll give this solution a try and mark it correct if it works.
Unfortunately this doesn't seem to work.
A bug report has been submitted to follow up.
I tried all the answers online and my own way of fix things, i ended up coming up with nothing that works. I ended up using eval to extract the filedate from source.
FYI
I don't know how this line worked for him
EXTRACT-newdate=\/.?\/.?_(?P\d+)..* in source
but i had to go to transforms.conf to get that working.
props.conf
REPORT-filedate= tranName
transforms.conf
[tranName]
blah blah
I did something similar but it was a little bit tricky.
In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:
EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source
Then you can concat the fields and send to another index with "collect" command with the new _time field:
index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex
Maybe this is not the cleanest way to do it but I'm sure it can help you.
Thanks for your reply, but I was hoping to get the _time field properly extracted from the very start. I will attempt @tchen_splunk 's solution, and if that doesn't work I'll give your search time solution a try.
Thanks again.
what does your entire props stanza look like for this sourcetype? are you using indexed_extractions?
Yes I am
[ csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TIME_FORMAT=%s%9N
FIELD_NAMES=sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
TIMESTAMP_FIELDS=nanos
I think the problem is that %s represents Epoch time, not seconds since midnight.
%s The Unix Epoch Time timestamp, or the number of seconds since the Epoch: 1970-01-01 00:00:00 +0000 (UTC). (1352395800 is Thu Nov 8 09:30:00 2012)
So Splunk is grabbing 10 digits for epoch, ie 6273371272. And that resolves to GMT: Mon, 17 Oct 2168 11:07:52.
I don't think there is a common time format variable that represents seconds since midnight.
oh and is that configured on your forwarder or indexer or elsewhere?