My goal is to create a stacked area timechart that has the number of unique "users" on y-axis split by "user age", where "user age" is bucketed into 1 day spans and the first 5 buckets from 0 upward are included in the plot (with rest of the buckets in OTHER). The search
timechart dc(user) span=1d by limit=5 user_age span=1d
does almost what I want, except it includs the 5 largest buckets, not first 5 consecutive buckets as I would want. Top N values based on the sum of each series is the default documented behavior of
Thanks,
Give this workaround a try
Your base search with fields user, _time, user_age | bucket span=1d _time | stats dc(user) as UserCount by _time user_age | sort user_age | streamstats count as sno by user_age | eval sno=if(sno>1,0,sno) | accum sno | eval user_age=if(sno>5,"OTHER",user_age) | timechart span=1d sum(UserCount) by user_age
Its tough to say without looking at the logs. Would you mind providing some and also sample output based on that data?
Thanks for the revised answer. It seems I need to bucket the user_age at the beginning to make your method work (bucket span=1d user_age). The resulting counts are inflated slightly compared with a search without splitting by user_age, though. When we are doing "stats dc(user) as UserCount by _time user_age" users that have a transition in user_age within a _time bucket are counted twice, right?
Some background: user_age is a field containing the time in seconds elapsed after the creation of a specific user at the time of the recorded event. Basically user_age = (_time - user_created_ts).
I may have overlooked some conditions. Could you try the updated answer?
Thanks for the reply. I changed the span to 1d in both places in your example, because I'm intersted in day sized buckets both with _time & user_age. Unfortunately I'm not getting the same total distinct user count as with plain "timechart dc(user) span=1d".