Getting Data In

What is the best way to add timestamps to a large log file without timestamps?

dskillman
Splunk Employee
Splunk Employee

I have a file with ~6M events that gets FTP'd to Splunk on a daily basis. Unfortunately I don't have control of the output and there are no timestamp. Using CURRENT_TIME breaks things since all events show up with the same time and I have to search across an entire day at a time.

Any thoughts on how to get enough timestamps so that that I don't run into search limitations?

I was thinking of using an LWF to receive the FTP'd file and tweak the maxKBps in limits.conf so that CURRENT_TIME processes across 10's or 100's of seconds. Thoughts?

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

The easiest way would simply be to name the file with the date/timestamp in a way that datetime.xml can get the timestamp, assuming the events are all supposed to have the same timestamp. Then, Splunk should extract the date/time from the file name, and auto-increment the extracted time as it finds that it's getting too many repeats.

Similarly, if you can manipulate the file, you could prepend a single timestamp at the top of the file and subsequent events lacking a timestamp should get that timestamp.

If more than 100,000 events come in for the same host/source/sourcetype in sequence with the same second timestamp, Splunk will auto-increment timestamps by 1 second, specifically to avoid this issue, so either of these solutions should work.

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...