Getting Data In

extracting timestamps from filenames - regex never matches

blakezinc
Engager

Hi, I'm new to splunk, and I know there have been a thousand questions on extracting timestamps out of filenames, and I've read a ton of them, as well as the docs, and am still totally stuck.

I'm indexing some pipe delimited files with no internal timestamp, the timestamp is in both the file path and the filename. I've created a new datetime.xml in /opt/splunk/etc/system/local/, and in /opt/splunk/etc/system/local/props.conf I added a DATETIME_CONFIG to refer to my local version for my sourcetype.

The filenames being indexed look like this:
/home/(myusername)/sqldump/reports/2013-02-08_1100/(prefix)_2013-02-08_1100.txt
where the date changes daily, of course. My inputs.txt reflects this, and finds all the files to index just fine.

To try to extract the timestamps out, I added lines like this to my datetime.xml:

<define name="_masheddate3" extract="year, month, day, hour, minute"><text><![CDATA[.*(\d{4})-(\d{2})-(\d{2})\_(\d{2})(\d{2})]]></text></define>

<define name="_masheddate4" extract="year, month, day, hour, minute"><text><![CDATA[source::/home/<myusername>/sqldump/reports/(\d{4})-(\d{2})-(\d{2})\_(\d{2})(\d{2})]]></text></define>

Then in the <timePatterns> and <datePatterns> section, I have:
<use name="_masheddate3"/> and
<use name="_masheddate4"/>. I added both just to see if ANY of them would match.

To this point I thought I was all set, as I had followed this link: http://blogs.splunk.com/2009/12/02/configure-splunk-to-pull-a-date-out-of-a-non-standard-filename/

However, the events from all the files are imported with the same timestamp, which is the modified time of the files on the local disk (all the same). I restarted the server in debug mode and I see in the splunkd.log:

02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - put _masheddate3 regex=.*(\d{4})-(\d{2})-(\d{2})_(\d{2})(\d{2})
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * year
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * month
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * day
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * hour
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * minute
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - put _masheddate5 regex=source::/home/(myusername)/sqldump/reports/(\d{4})-(\d{2})-(\d{2})_(\d{2})(\d{2})
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * year
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * month
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * day
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * hour
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes -     * minute
02-12-2013 19:36:44.814 INFO  DateParser - Set timezone to: America/New_York
02-12-2013 19:36:44.815 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/home/(myusername)/sqldump/reports/2013-02-10_1100/(stringprefix)_2013-02-10_1100.txt", data_host="(myhostname)", data_sourcetype="(mysourcetype)"

Then many of those "failed to parse timestamp" messages.

I have tried a hundred different regex variations, and nothing I do will match. This is with version 5.0.2. I know that I could switch the mode to CURRENT to get the indexing time in there, or just stick with the modified time, but I'd really prefer to get it from the filename/path itself.

Does anyone have any ideas? I'd love any suggestions! Maybe there's something I'm just completely missing.
Thanks!

Tags (1)

splunk24
Path Finder

Filename can come through sources file
And then overwrite _time to timestamp extracted from source

0 Karma

gelica
Communicator

I'm also having this problem, did you ever figure out how to solve this problem?

0 Karma

blakezinc
Engager

Nope. I ended up changing the format of my input files to include the timestamp, which isn't a viable solution for everyone.

All I can figure is it's a bug.

0 Karma

jonuwz
Influencer

Well, thats really, really annoying. I cant get this to work at all, even when the files match source:: patterns in /etc/datetime.xml - it still fails.

Tried with 4.3 and 5

+100 bounty for a working config.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...