Hi, I'm new to splunk, and I know there have been a thousand questions on extracting timestamps out of filenames, and I've read a ton of them, as well as the docs, and am still totally stuck.
I'm indexing some pipe delimited files with no internal timestamp, the timestamp is in both the file path and the filename. I've created a new datetime.xml in /opt/splunk/etc/system/local/, and in /opt/splunk/etc/system/local/props.conf I added a DATETIME_CONFIG to refer to my local version for my sourcetype.
The filenames being indexed look like this:
/home/(myusername)/sqldump/reports/2013-02-08_1100/(prefix)_2013-02-08_1100.txt
where the date changes daily, of course. My inputs.txt reflects this, and finds all the files to index just fine.
To try to extract the timestamps out, I added lines like this to my datetime.xml:
<define name="_masheddate3" extract="year, month, day, hour, minute"><text><![CDATA[.*(\d{4})-(\d{2})-(\d{2})\_(\d{2})(\d{2})]]></text></define>
<define name="_masheddate4" extract="year, month, day, hour, minute"><text><![CDATA[source::/home/<myusername>/sqldump/reports/(\d{4})-(\d{2})-(\d{2})\_(\d{2})(\d{2})]]></text></define>
Then in the <timePatterns> and <datePatterns> section, I have:
<use name="_masheddate3"/> and
<use name="_masheddate4"/>. I added both just to see if ANY of them would match.
To this point I thought I was all set, as I had followed this link: http://blogs.splunk.com/2009/12/02/configure-splunk-to-pull-a-date-out-of-a-non-standard-filename/
However, the events from all the files are imported with the same timestamp, which is the modified time of the files on the local disk (all the same). I restarted the server in debug mode and I see in the splunkd.log:
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - put _masheddate3 regex=.*(\d{4})-(\d{2})-(\d{2})_(\d{2})(\d{2})
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * year
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * month
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * day
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * hour
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * minute
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - put _masheddate5 regex=source::/home/(myusername)/sqldump/reports/(\d{4})-(\d{2})-(\d{2})_(\d{2})(\d{2})
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * year
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * month
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * day
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * hour
02-12-2013 19:36:44.814 DEBUG LoadDateParserRegexes - * minute
02-12-2013 19:36:44.814 INFO DateParser - Set timezone to: America/New_York
02-12-2013 19:36:44.815 DEBUG AggregatorMiningProcessor - Failed to parse timestamp. Defaulting to time specified by data input. - data_source="/home/(myusername)/sqldump/reports/2013-02-10_1100/(stringprefix)_2013-02-10_1100.txt", data_host="(myhostname)", data_sourcetype="(mysourcetype)"
Then many of those "failed to parse timestamp" messages.
I have tried a hundred different regex variations, and nothing I do will match. This is with version 5.0.2. I know that I could switch the mode to CURRENT to get the indexing time in there, or just stick with the modified time, but I'd really prefer to get it from the filename/path itself.
Does anyone have any ideas? I'd love any suggestions! Maybe there's something I'm just completely missing.
Thanks!
... View more