Solved: How to combine year, month, day from a filename an...

aholzer · ‎10-20-2016

I need to ingest a file that contains the year, month, and day in the filename, while also containing the exact time of the event (since midnight) within the day inside the file, in a field called nanos. I'm trying to figure out the best way to get a combination of these two pieces of information to get the correct _time field.

Sample file:
Filename: test_csv_parsing_20161011.txt

sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
29585650,62733712723932,'*',zzzz,zzzz,54,''

Given the following:

nanos => seconds => hours:mins:secs.nanos
62733712723932 => 62733.712723932 => 17:25:33.712723932

Expected results:

_time = 2016/10/11 17:25:33.712723932

I have already tried the following:

TIME_FORMAT=%s%9N
TIMESTAMP_FIELDS=nanos

Based on what it says in the configure timestamps Splunk docs, it should take the date from the filename if it can only find the time inside the event: "4. If no events in a source have a date, Splunk software tries to find a date in the source name or file name. Time of day is not identified in filenames. (This requires that the events have a time, even though they don't have a date.)"

But I get the following warnings in the "adddata/datapreview" dashboard: "The TIME_FORMAT specified is matching timestamps (Mon Oct 17 07:07:52 2168) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE." & "Failed to parse timestamp. Defaulting to file modtime."

It's probably because it's giving the %s priority over the %9N in the TIME_FORMAT. If it first captured the last 9 digits as nanoseconds, and then used the rest as seconds this should work.

Can anybody provide guidance?

jrballesteros05 · ‎10-20-2016

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex

Maybe this is not the cleanest way to do it but I'm sure it can help you.

View solution in original post

tchen_splunk · ‎10-21-2016

You'll need to customize datetime.xml to parse out the date from filename
https://answers.splunk.com/answers/172535/extract-date-time-from-source-path-with-custom-dat.html

aholzer · ‎10-25-2016

Thanks @tchen_splunk - I'll give this solution a try and mark it correct if it works.

aholzer · ‎11-02-2016

Unfortunately this doesn't seem to work.

A bug report has been submitted to follow up.

moaf13 · ‎11-04-2016

I tried all the answers online and my own way of fix things, i ended up coming up with nothing that works. I ended up using eval to extract the filedate from source.

FYI

I don't know how this line worked for him
EXTRACT-newdate=\/.?\/.?_(?P\d+)..* in source

but i had to go to transforms.conf to get that working.

props.conf
REPORT-filedate= tranName

transforms.conf
[tranName]
blah blah

jrballesteros05 · ‎10-20-2016

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex

Maybe this is not the cleanest way to do it but I'm sure it can help you.

aholzer · ‎10-25-2016

@jrballesteros05

Thanks for your reply, but I was hoping to get the _time field properly extracted from the very start. I will attempt @tchen_splunk 's solution, and if that doesn't work I'll give your search time solution a try.

Thanks again.

maciep · ‎10-20-2016

what does your entire props stanza look like for this sourcetype? are you using indexed_extractions?

aholzer · ‎10-20-2016

Yes I am

[ csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TIME_FORMAT=%s%9N
FIELD_NAMES=sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
TIMESTAMP_FIELDS=nanos

maciep · ‎10-20-2016

I think the problem is that %s represents Epoch time, not seconds since midnight.

 %s The Unix Epoch Time timestamp, or the number of seconds since the Epoch: 1970-01-01 00:00:00 +0000 (UTC). (1352395800 is Thu Nov 8 09:30:00 2012)

So Splunk is grabbing 10 digits for epoch, ie 6273371272. And that resolves to GMT: Mon, 17 Oct 2168 11:07:52.

I don't think there is a common time format variable that represents seconds since midnight.

maciep · ‎10-20-2016

oh and is that configured on your forwarder or indexer or elsewhere?

How to combine year, month, day from a filename and contain the exact time of the event in nanoseconds?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!