I have a multi-value field "activity" that can be very long and contain many unique values (60+). I want to be able to summarize the count of activities per hour per user in order to populate a summary index. I've used mvexpand and the chart command to create a new field based on each value and count, but I receive the following message "chart command has reached the limit for data points" even when using limit=0 (Splunk 4.3.5). If I simply do a count by activity then I'll receive many more events than is in the actual source. For example, a day with 40,000 events would create a summary of 200,000+ events.
In order to keep the summary as concise as possible, is there a way to transform a multi-value field from something like activity="viewMailbox compose send compose send compose save"
to activity_count="viewMailbox=1 compose=3 send=2 save=1"
?
If not, is there any issue with creating a summary that results in much more data than is contained in the actual source? Should I be trying to summarize this in a different way that is more efficient for Splunk?
Many thanks in advance!
Updated for solution
Just wanted to follow up with what I finally did. First, I created a new multi value field by using
| makemv delim=" " activity | mvexpand activity
| stats count(activity) as count by activity
| eval activity_count=activity."=".count
| stats values(activity_count) as activity sum(count) as count
This gave me a new multi value field:
activity_count="viewMailbox=1,compose=3,send=2,save=1"
I then use mvindex
to separate the key value pairs at search time when needed.
| eval activity_count=split(activity,"=")
| eval activity=mvindex(activity_count,0)
| eval count=mvindex(activity_count,1)
This works for my needs. Hope this helps somebody else.
nicely done
Updated original post with my final solution since it wouldn't fit in the comment.
what about
... | stats values(activity) as distinct_list by User
| eval distinct_activities=mvcount(distinct_list)
Thanks for your help.
| eval activity_mv=split(activity_count," ") | mvexpand activity_mv | eval activity_list=split(activity_mv,"=") | eval key=mvindex(activity_list,0) | eval value=mvindex(activity_list,1)
I think I'm making progress. I was able to create a field called activity_count that is formatted as activity_count="viewMailbox=1 compose=3 send=2 save=1"
. Now, is it possible to then extract the key value pairs at search time so that I could know that compose is 3?
about about mvexpand | chart c over expanded_activity by user_id
?
That's part of the issue. I could hard code but there are currently 60+ activities which will continue to increase over time and using the chart over is causing the error "chart command has reached the limit for data points" even when using the limit=0
. I'm unsure why I am reaching the limit when using this option. This is why I'm trying to come up with another workable solution. I may just have to use | mvexpand activity | bucket _time span=1h | stats count by activity user _time
although this would create a very large summary.
if you have a predefined list of activities that can be hard-coded in the query, maybe use "eval" in the chart / stats in some way.. you may still need to use an mvexpand to get the events into the correct format though...
... | stats c(eval(activity="compose")) as compose_count c(eval(activity="send")) as send_count
or maybe expand + chart c over ... by ... (not sure if you are still getting the limit issue with this).
... | activity_mv=split(activity," ") | mvexpand activity_mv | chart c over activity_mv by userId
Thanks for the suggestion. I don't want to count the number of distinct activities, I want to be able to count the number of times an activity appears in the list. I know I can use mvexpand and get something like:
user activity count
1 send 2
1 compose 3
1 save 1
But I would really like to have this
user activity_count
1 send(2)
compose(3)
save(1)
Is this possible?