Hi,
I would like to group the daily users by their number of active days during the last 2 weeks. My current search looks like this:
...
| bucket span=1d _time
| stats dc(user_id) AS count BY user_id, _time
| sort 0 + _time
| streamstats window=14 dc(_time) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days
But this calculation does not include days of inactivity (because of the stats command) and ends up incorrect results.
How can I achieve my goal?
BR
Heinz
Try this:
...
| timechart span=1d count BY user_id
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days
I think I've found a workaround. But it's veeeery expensive so I'm not really happy with it.
| bucket span=1d _time
| stats dc(account_id) AS dc by user_id_1, _time
| eventstats values(account_id) AS user_id_2
| mvexpand user_id_2
| stats count(eval(if(user_id_1=user_id_2, user_id_1, null()))) AS active BY user_id_2, _time
| sort 0 + user_id_2, _time
| streamstats window=14 global=f dc(eval(if(active=1, _time, null()))) AS active_days BY user_id_2
| search active=1
| timechart span=1d dc(user_id_2) BY active_days limit=14
See my update (new comment at the bottom) that fixes the problem with the solution as originally presented.
Try this:
...
| timechart span=1d count BY user_id
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days
hi,
thanks for your answer. But I think the streamstats doesn't like the "timechart BY" before. It is not adding the active_days field
You are correct; I am working on a correction.
Try this:
...
| timechart span=1d count BY user_id
| untable _time user_id count
| streamstats window=14 count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(user_id) BY active_days
Hi woodcock,
thanks a lot for coming back to my problem.
I think we have to do some small adjustments, then this approach shows complete & correct results:
...
| timechart span=1d count BY user_id limit=10000000
| untable _time user_id count
| sort 0 + account_id, _time
| streamstats window=14 global=false count(eval(count>0)) AS active_days BY user_id
| timechart span=1d dc(eval(if(count>0, account_id, null()))) BY active_days
The sort command should have the same effect on the results like global=false, but I hope it helps to improve the peformance of streamstats.
Is there an option that tells a timechart to set the limit=infinite ?
The sort
portion is destroying the function of the search. The results must remain sorted primarily by _time
or the streamstats
part will not work correctly so definitely remove the sort
line. That is a very important catch on the global=false
part though!!! Usually limit=0
disables the limit entirely but I have not tried it on timechart
.
There is no difference, when I remove the sort line. And I don't get how this could destroy the function.
In the end the streamstats command has to analyse a window of 14 days per user_id. After the sort line, the table is so well prepared, that I could even delete the global=false command.
PS: limit=0 is fine in the timechart