I have a relatively large number of events being indexed and funneled into its own index based on source & source type (millions a week). This stream of events contains information about user activity in a product of ours and we desire to summarize user activity on a daily basis then build a dashboard that visualizes this summary information in various ways (often on longer timescales). We will likely utilize an accelerated search (prefer the simplicity) but may decide to use a summary search.
Note we are currently still using splunk 5.0.5.
The following is an example of a summary query that I am experimenting with and I am looking for any suggestions on how to improve it. It seems a little wrong to use if/match like I am.
index=myproduct build_type=prod (event_type="creating shape" OR event_type="Selecting tool" OR event_type="Undoing shape" OR event_type="Redoing shape") | eval DrawEvent=if(match(event_type,"creating shape"),"1","0") | eval ToolEvent=if(match(event_type,"Selecting tool"),"1","0") | eval UndoEvent=if(match(event_type,"Undoing shape"),"1","0") | eval RedoEvent=if(match(event_type,"Redoing shape"),"1","0") | bucket _time span=1day | stats sum(DrawEvent) AS UserDrawCount sum(ToolEvent) AS UserToolCount sum(UndoEvent) AS UserUndoCount sum(RedoEvent) AS UserRedoCount by _time,logged_user_id
...which produces a table like the following...
_time logged_user_id UserDrawCount UserToolCount UserUndoCount UserRedoCount
1 3/16/14 12:00:00.000 AM AAAAA 59 7 0 0
2 3/16/14 12:00:00.000 AM BBBBBB 135 35 42 2
3 3/16/14 12:00:00.000 AM CCCCC 139 3 0 0
4 3/16/14 12:00:00.000 AM DDDDD 895 65 54 1
Note in a future version of the product we are reworking the naming conventions used to allow for a wildcard to be used in the search (instead of such specific text) to narrow down the event stream to a family of user actions we wish to summarize in the same query.
All in all - yeah, seems reasonable to me.
Consider moving the categorizing-eval-chain out into a macro for easy reuse and maintenance.
All in all - yeah, seems reasonable to me.
Consider moving the categorizing-eval-chain out into a macro for easy reuse and maintenance.
You could merge the match into the stats like this:
... | stats count(eval(match(event_type, "creating shape"))) as UserDrawCount ...
But that's not necessarily better to read and maintain. From a performance point of view it's not going to matter much.
Basically is searching on event_type to narrow the number of events looked at followed by using eval=if(match(...) to tally each event_type matched, then bucketing by day, then summarizing using stats makes sense... or does a better way exist to do the daily summary not using the eval=if(match(..)) stuff but maybe features of stats more directly?
Again it needs to be grouped by day and logged in user.
Yeah, feeding that into a summary index will give you great long-term statistics performance.
I am basically looking to see if what I am doing about is reasonable or if a better way exists.
I have a stream of events like the following coming in from users using our app...
logged_user_id="AAAAA" event_type="creating shape" ...
logged_user_id="BBBBBB" event_type="Selecting tool" ...
logged_user_id="AAAAA" event_type="creating shape" ...
logged_user_id="CCCCC" event_type="Redoing shape" ...
I want to summarize this into a daily tally of each type of event by user, so turning multiple events into a single event for each user on each day. This will then be used to feed sub searches.
Maybe it's just me, but what is your question?