According to doco: "The date_hour field ... is extracted from the event's timestamp (the value in _time)." Consider this test:
index=*
| eval hour=strftime(_time,"%H")
| eval shift=date_hour-hour
| stats count by shift index date_zone
| stats values(eval(index."-".count)) as sourcetype sum(count) as count by shift date_zone
| sort by -shift
The output is all over the map:
shift date_zone sourcetype count
17 local main-3550674
r-7006
sample-16093 3573773
16 local r-1572 1572
0 0 main-1158239 1158239
-7 local main-3817593
r-18887
sample-41819 3878299
-8 local main-1626
r-2839 4465
When I examine raw data closely, it seems that strftime(_time,"%H") reports the hour of day correctly.
Similar inconsistence exists in date_mday vs "%d".
The one case where I know date_* fields are all a bit wrong, is when _time is being extracted from an epochtime value. ie when _time
is calculated from something in the raw event text that is itself a number of seconds since 1970.
date_hour and date_minute and all its friends, are technically not always extracted. In cases where the timestamp extraction can't find a reliable timezone, Splunk isn't supposed to create any of these fields. That by the way can be a surprise for people who have come to expect they are always there.
But in the case of _time
is extracted from an epochtime value in the events, even though in such a case the timestamp-extraction code really has no valid timezone listed there, it has always as far as I know had a bug in it where it will erroneously assume the data is in GMT, and go on to create date_* fields as though the data were unequivocally in GMT.
The only recourse that I know of, is to just stop trusting date_* completely when you're using one of these data sets.
Or to hardcode an offset into your search that represents your offset from GMT, (and then change it twice a year for DST!). I recommend not trusting it, and just creating your own little fields in props.conf.
EVAL-hour_of_day=strftime(_time,"%H")
EVAL-day_of_week=strftime(_time,"%a")
It's been this way for years, I've filed it several times as a bug, I've even had conversations with engineering (years ago now) about why it's a tremendous pain for them to fix.
Where did you find that statement in the documentation? I couldn't find it - and I think it is wrong...
http://docs.splunk.com/Documentation/Splunk/6.3.3/Knowledge/UseDefaultFields - scroll down to "Default datetime fields". The statement surely is wrong in the sample that I just examined.
Interestingly, closer to the top of the doco, there is a correct note:
Note: Only events that have timestamp information in them as generated by their respective systems will have date_* fields. If an event has a date_* field, it represents the value of time/date directly from the event itself. If you have specified any timezone conversions or changed the value of the time/date at indexing or input time (for example, by setting the timestamp to be the time at index or input time), these fields will not represent that.
The one case where I know date_* fields are all a bit wrong, is when _time is being extracted from an epochtime value. ie when _time
is calculated from something in the raw event text that is itself a number of seconds since 1970.
date_hour and date_minute and all its friends, are technically not always extracted. In cases where the timestamp extraction can't find a reliable timezone, Splunk isn't supposed to create any of these fields. That by the way can be a surprise for people who have come to expect they are always there.
But in the case of _time
is extracted from an epochtime value in the events, even though in such a case the timestamp-extraction code really has no valid timezone listed there, it has always as far as I know had a bug in it where it will erroneously assume the data is in GMT, and go on to create date_* fields as though the data were unequivocally in GMT.
The only recourse that I know of, is to just stop trusting date_* completely when you're using one of these data sets.
Or to hardcode an offset into your search that represents your offset from GMT, (and then change it twice a year for DST!). I recommend not trusting it, and just creating your own little fields in props.conf.
EVAL-hour_of_day=strftime(_time,"%H")
EVAL-day_of_week=strftime(_time,"%a")
It's been this way for years, I've filed it several times as a bug, I've even had conversations with engineering (years ago now) about why it's a tremendous pain for them to fix.
I have this example, where _time is not from an epoch time in the source event, a syslog entry
Mar 29 17:54:11 amiohdrmp1 snmpd[14773]: Connection from UDP: [127.0.0.1]:46920
In this particular case, syslog uses EDT without printing zone info. Splunk correctly dates this event at 3/29/16 9:54:11.000 PM, i.e., 21:54:11. As a result, %H correctly gives 21. However, date_hour is 17, the split output from source text!
Whereas this case looks like a fixable bug, the designer may have other use cases in mind. You have sufficiently scared me, so I'll just accept "in date_* no trust" as answer:-)
To close the loop. After reviewing @lgnuin's comment about the doco being wrong and discovering the correct note in the same doco page, the above example can be explained - as kind of expected behavior. Here, syslog is not logging year, so Splunk discarded "Mar 29 17:54:11" and supplied indexer timestamp . Per the correct part of the doco: "If you have ... changed the value of the time/date at indexing or input time (for example, by setting the timestamp to be the time at index or input time), these fields will not represent that."
In this sense, it is not a bug.
Actually, Splunk does not discard "Mar 29 17:54:11"
If the event arrived to be indexed before Mar 29, 2016, Splunk would assume the year to be 2015. Otherwise, Splunk would assume the current year (2016).
(An easier-to-understand example: if an event showed up today (30-Mar-2016) with a timestamp of "Aug 9 17:54:11", Splunk would assume 2015. For a timestamp of "Feb 2 17:54:11", it would assume 2016.)
Many, many timestamps have this form, although syslog is the most common. If Splunk wasn't able to deal with this, a lot of inputs would be broken.
Why did Splunk figure it was EDT? Check out this docs page, in the section How Splunk applies time zones My guess is that the forwarder supplied the timezone of the underlying OS.
Yes, what @sideview said: "stop trusting date_*
completely" - although I would go even farther and say "don't use date_*
".
_time
is "normalized" - by that I mean: it is parsed by Splunk, using any timezone and props.conf information available; it is stored in the index in UTC; it is displayed to you based on your user timezone setting in the GUI. If you extract the hour (%H
) from _time
, it will always be "right" and it always exists.
I would really like to have a way to suppress the date fields altogether, so that users can't see them and use them without understanding the consequences.
From this page in the Splunk Docs:
http://docs.splunk.com/Documentation/Splunk/6.3.3/Knowledge/UseDefaultFields
"The datetime values are the literal values parsed from the event when it is indexed, regardless of its timezone."