Getting Data In

How to combine year, month, day from a filename and contain the exact time of the event in nanoseconds?

aholzer
Motivator

I need to ingest a file that contains the year, month, and day in the filename, while also containing the exact time of the event (since midnight) within the day inside the file, in a field called nanos. I'm trying to figure out the best way to get a combination of these two pieces of information to get the correct _time field.

Sample file:
Filename: test_csv_parsing_20161011.txt

sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
29585650,62733712723932,'*',zzzz,zzzz,54,''

Given the following:

nanos => seconds => hours:mins:secs.nanos
62733712723932 => 62733.712723932 => 17:25:33.712723932

Expected results:

_time = 2016/10/11 17:25:33.712723932

I have already tried the following:

TIME_FORMAT=%s%9N
TIMESTAMP_FIELDS=nanos

Based on what it says in the configure timestamps Splunk docs, it should take the date from the filename if it can only find the time inside the event: "4. If no events in a source have a date, Splunk software tries to find a date in the source name or file name. Time of day is not identified in filenames. (This requires that the events have a time, even though they don't have a date.)"

But I get the following warnings in the "adddata/datapreview" dashboard: "The TIME_FORMAT specified is matching timestamps (Mon Oct 17 07:07:52 2168) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE." & "Failed to parse timestamp. Defaulting to file modtime."

It's probably because it's giving the %s priority over the %9N in the TIME_FORMAT. If it first captured the last 9 digits as nanoseconds, and then used the rest as seconds this should work.

Can anybody provide guidance?

1 Solution

jrballesteros05
Communicator

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex 

Maybe this is not the cleanest way to do it but I'm sure it can help you.

View solution in original post

tchen_splunk
Splunk Employee
Splunk Employee

You'll need to customize datetime.xml to parse out the date from filename
https://answers.splunk.com/answers/172535/extract-date-time-from-source-path-with-custom-dat.html

aholzer
Motivator

Thanks @tchen_splunk - I'll give this solution a try and mark it correct if it works.

0 Karma

aholzer
Motivator

Unfortunately this doesn't seem to work.

A bug report has been submitted to follow up.

0 Karma

moaf13
Path Finder

I tried all the answers online and my own way of fix things, i ended up coming up with nothing that works. I ended up using eval to extract the filedate from source.

FYI

I don't know how this line worked for him
EXTRACT-newdate=\/.?\/.?_(?P\d+)..* in source

but i had to go to transforms.conf to get that working.

props.conf
REPORT-filedate= tranName

transforms.conf
[tranName]
blah blah

0 Karma

jrballesteros05
Communicator

I did something similar but it was a little bit tricky.

In my case the source field is "/root/test_csv_parsing_20161011.txt" then I extracted a new field called newdate from metadata "source" like this in the props.conf:

EXTRACT-newdate=\/.*?\/.*?\_(?P<newdate>\d+)\..* in source

Then you can concat the fields and send to another index with "collect" command with the new _time field:

index=proof | eval seconds=round(nanos*0.000000001) |eval newseconds=strftime(seconds,"%H%M%S") | eval newtimestamp=newdate.newseconds | eval c_time=strptime(newtimestamp,"%Y%m%d%H%M%S") |eval _time=c_time | eval _raw=_time + ": " + _raw | collect index=mynewindex 

Maybe this is not the cleanest way to do it but I'm sure it can help you.

aholzer
Motivator

@jrballesteros05

Thanks for your reply, but I was hoping to get the _time field properly extracted from the very start. I will attempt @tchen_splunk 's solution, and if that doesn't work I'll give your search time solution a try.

Thanks again.

0 Karma

maciep
Champion

what does your entire props stanza look like for this sourcetype? are you using indexed_extractions?

0 Karma

aholzer
Motivator

Yes I am

[ csv ]
CHARSET=UTF-8
INDEXED_EXTRACTIONS=csv
KV_MODE=none
SHOULD_LINEMERGE=false
category=Structured
description=Comma-separated value format. Set header and other settings in "Delimited Settings"
disabled=false
pulldown_type=true
TIME_FORMAT=%s%9N
FIELD_NAMES=sequenceNumber,nanos,msgType,poolId,accountId,fixMsgType,fixData
TIMESTAMP_FIELDS=nanos
0 Karma

maciep
Champion

I think the problem is that %s represents Epoch time, not seconds since midnight.

 %s The Unix Epoch Time timestamp, or the number of seconds since the Epoch: 1970-01-01 00:00:00 +0000 (UTC). (1352395800 is Thu Nov 8 09:30:00 2012)

So Splunk is grabbing 10 digits for epoch, ie 6273371272. And that resolves to GMT: Mon, 17 Oct 2168 11:07:52.

I don't think there is a common time format variable that represents seconds since midnight.

0 Karma

maciep
Champion

oh and is that configured on your forwarder or indexer or elsewhere?

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...