Solved: summary index event date and sourcetype

hiddenkirby · ‎06-02-2010

when i create a summary index for the speed benefit and to filter results there are two main things i lose.

Each event then(after summary indexing) has a new date of when the summary index was created ...no longer the original event date.
The sourcetype=stash now... instead of the original sourcetype.

Is there anyway around this? a way to Pass this through per event?

apologies if this was cryptic.

Lowell · ‎06-02-2010

The summary indexing process will use _time for the event's timestamp if _time is a field that exists in your results. (As per How does summary indexing handle time?.) But in the normal case of using some stats-like command, you don't often keep the _time field around so the summary index process falls back to the time of your search.

If you want to use one of the stats commands and you want a better time breakdown, you could look at using bucket command and set span to something less than the interval of your saved search:

... | bucket _time span=5m | stats avg(thruput) by _time host

(You may also find sitimechart helpful here, but I've generally avoided all the si* helper commands and handled the funky statistical corner cases myself rather than let splunk do it. I've seen some of the si* command produce more "summary" events than I had input events... which is a step backwards!)

With bucket or (si)?timechart, you will still not have the exact _time of the original event, but that's rather central to how summary indexing works. I suppose you could do a | stats min(_time) as _time by field but you will still only keep one timestamp from your groups of events... the bottom line is that you can't keep the exact same timestamp of all your events without duplicating all your events, which then defeats the purpose of summary indexing....

In terms of keeping sourcetype. You can't (or should) do it. In splunk 4.x, the summary indexing process does now set source to the name of your saved search. You still have a copy of the savedsearch in the event itself called search_name, but searching against source (since it's one of the primary indexed fields) is really fast. So I would just suggest that you leverage that instead. You still don't have a great drill down option with this, but it's possible. (You can let the sourcetype field go to your summary index, but it get's renamed orig_sourcetype which I suppose you could then leverage for drilldown purposes.) I suppose you could make a TRANSFORMS entry on the stash sourcetype that would look for orig_sourcetype in your event and then assign the sourcetype to that value, but that just seems like a bad idea....

View solution in original post

Lowell · ‎06-02-2010

BTW. It may be more helpful to add to your original question (by using the "edit" feature) rather than using comments.

Lowell · ‎06-02-2010

The summary indexing process will use _time for the event's timestamp if _time is a field that exists in your results. (As per How does summary indexing handle time?.) But in the normal case of using some stats-like command, you don't often keep the _time field around so the summary index process falls back to the time of your search.

If you want to use one of the stats commands and you want a better time breakdown, you could look at using bucket command and set span to something less than the interval of your saved search:

... | bucket _time span=5m | stats avg(thruput) by _time host

(You may also find sitimechart helpful here, but I've generally avoided all the si* helper commands and handled the funky statistical corner cases myself rather than let splunk do it. I've seen some of the si* command produce more "summary" events than I had input events... which is a step backwards!)

With bucket or (si)?timechart, you will still not have the exact _time of the original event, but that's rather central to how summary indexing works. I suppose you could do a | stats min(_time) as _time by field but you will still only keep one timestamp from your groups of events... the bottom line is that you can't keep the exact same timestamp of all your events without duplicating all your events, which then defeats the purpose of summary indexing....

In terms of keeping sourcetype. You can't (or should) do it. In splunk 4.x, the summary indexing process does now set source to the name of your saved search. You still have a copy of the savedsearch in the event itself called search_name, but searching against source (since it's one of the primary indexed fields) is really fast. So I would just suggest that you leverage that instead. You still don't have a great drill down option with this, but it's possible. (You can let the sourcetype field go to your summary index, but it get's renamed orig_sourcetype which I suppose you could then leverage for drilldown purposes.) I suppose you could make a TRANSFORMS entry on the stash sourcetype that would look for orig_sourcetype in your event and then assign the sourcetype to that value, but that just seems like a bad idea....

Jason · ‎03-31-2011

I've generally avoided all the si* helper commands and handled the funky statistical corner cases myself - Is there a writeup anywhere on what these cases are, or even what the si* commands do?

gkanapathy · ‎06-03-2010

Yeah, just use 'orig_sourcetype' if you need it. Similarly, the 'host' is usually set to 'orig_host'.

It is often useful to store min(_time) and max(_time) in aggregates (but again only one of each per aggregate) for purposes of weighting values by time intervals, where events are less regular than bucketed time spans.

hiddenkirby · ‎06-02-2010

to extend a bit on that... the idea was since the summary index had an aggregate(stats values) distinct showing of values i could select on... i could drill into a list of events with that field=value in them.

hiddenkirby · ‎06-02-2010

High level goal: I want to report(dashboard/charts/tables) on a specific bunch of fields extracted (used nasty regex) from a fairly sizable index. The idea was that a summary index pulling only the fields i need would be smarter to dashboard off of...

Lowell · ‎06-02-2010

Yeah, it's a bit cryptic. More details would be helpful. It sounds like summary indexing is working the way it was intended to. If you provide more details about what you are trying to do it would be helpful. It could be that summary indexing isn't the best fit for your usage case. What level of event reduction are you able to achieve? (What's the ratio of input events equals to summary events?)

summary index event date and sourcetype

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life