Hi,
I have a field called "UserID" and a DateActive field. I'm looking to make a bar chart where each bar has a value equal to the average # of unique users per day in a month divided by the total # of active users of that month, for every month in the year (Lets call this value Stickiness). For example, if the month of Jan averages 3,000 unique active users per day and has 10,000 unique active users in the entire month of Jan, then I want the Stickiness for Jan to be 3,000/10,000 or .3.
I'm able to get both unique active users per day and unique active users per month in separate queries but am having trouble doing it together to do the division since I am using
timechart span=1(day or month) dc(UserID) and using eval _time=DateModified
So how can I obtain both the unique daily active user and unique monthly active user counts in a single query so that I can perform division on them? I also want to display the data using a bar graph that has one bar for the Stickiness of each month in a 12 month window? I'd normally use timechart but i'm not sure how for a more complex situation like this.
Might have worded it poorly, let me know so I can clarify.
Give this a try
your base search | bucket span=1d _time | stats dc(UserID) as dailyUniq count(UserID) as DailyActive by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers sum(DailyActive) as TotalActive | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/TotalActive,2) | table Month Stickiness
Update#1
This should givev stickiness as ratio of Daily Average Uniq Users and Unique users for the month. Will be little slower than previous one.
your base search | bucket span=1d _time | eventstats dc(UserID) as dailyUniq by _time
| timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness
Not sure where the other posts/comments are gone from both you and me. Could've reached the limit on number of comments under one answer, so starting a new answer.
Try this
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness| eval Month=mvindex(split(Month," "),-1)
OR
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness | rex field=Month "(?<sortfield>\d+)\s(?<Month>\w+)" | sort num(sortfield) | fields - sortfield
They both order it the same- alphabetically. Perhaps a 2 timechart method would be better since it would automatically order it? I"m looking for the data from the last 12 months, aka from now (August) to Sept of 2015 sorted chronologically
Hmmm. So try this. Since it's using timechart, the order should be in time chronological order.
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID | timechart span=1mon count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | eval Month=strftime(_time,"%B")| table Month Stickiness
weird, so the values im getting are way off and i'm comparing the denom and numerator to the working queries denom and numerator. Denom in the timechart one is always 1, and numerator is wrong too, much smaller. Difficult problem haha, one fix causes another issue.
I can do some kind of manual sort where I check the current date using now() and then use that to order the previous months and check to see that the year is either same as now() or -1 if the month number is greater than the current month
Grrrr.. it's getting a lil frustrating.... Try this (high hope with this)
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness | eval sortfield=strptime(Month,"%Y%m %B") | sort num(sortfield) | eval Month=strftime(sortfield,"%B") | table Month Stickiness
Hey that worked! There was one small thing where the "Month" field wasn't displaying despite the Stickiness being sorted in the correct order, I had to make a small change but otherwise it worked perfectly. I changed the end of the query to be like this, for some reason this part
eval Month=strftime(sortfield,"%B")
wasn't working so I had to do use a regex instead.
| eval sortfield=strptime(Month,"%Y%m %b") | sort num(sortfield) | rex field=Month "(?<Month>[A-Z][a-z][a-z])" | table Month Stickiness
Thanks again, I know this was kinda a frustrating problem.
Give this a try
your base search | bucket span=1d _time | stats dc(UserID) as dailyUniq count(UserID) as DailyActive by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers sum(DailyActive) as TotalActive | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/TotalActive,2) | table Month Stickiness
Update#1
This should givev stickiness as ratio of Daily Average Uniq Users and Unique users for the month. Will be little slower than previous one.
your base search | bucket span=1d _time | eventstats dc(UserID) as dailyUniq by _time
| timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness
Thanks, there is one thing. Instead of TotalActive I actually wanted Monthy Unique Users, essentailly the same thing as the dailyUniq you calculated except for each month. Not sure how to do that in addition to getting the dailyUniq as you described
Try the updated answer
It's really close, for some reason the average unique daily users is off by a couple hundred though. I removed your search and replaced eventstats with stats to look like this:
base search | bucket span=1d _time | stats dc(UserID) as dailyUniq by _time | eval Month=strftime(_time,"%b") | where Month="May"
and got the correct unique daily users for each day in the month of May. Somehow though the average isnt calculated correctly which is confusing me, for instance in May I'm getting avg=~2100 when it should be ~1800.
Full code below - PARTICIPANT_AOID is UserID
eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | bucket span=1d _time | eventstats dc(PARTICIPANT_AOID) as DAU by _time | timechart span=1month avg(DAU) as avg_DAU, dc(PARTICIPANT_AOID) as MAU | eval Month=strftime(_time,"%m") | eval Month_Name = strftime(_time,"%b") | eval Year=strftime(_time,"%Y") | eval CurrDate = now() | eval CurrMonth=strftime(CurrDate,"%m") | eval CurrYear=strftime(CurrDate,"%Y") | where (Year=CurrYear) OR (Year=(CurrYear-1) AND (Month > CurrMonth)) | eval DAU/MAU=round(avg_DAU/MAU,2) | Table Month_Name DAU/MAU avg_DAU MAU
most of the time stuff is just to get the correct time windows because of how we indexed the data
edit: just realized i could have used earliest and made it a lot easier but that shouldn't be the issue
What is the time range for the search? The time-range works on the _time which splunk assigned during indexing. Here you're overriding the _time from field LastModifiedDate, so it could be possible that some of the events are not included in the base search itself. Can you try this and see if the average is correct?
your base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1mon avg(DAU) as avg_DAU
Do you know what is wrong? I plaeyd around some more and it seems to have to do with the bucket/event stats part.
base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | bucket span=1d _time | eventstats dc(PARTICIPANT_AOID) as DAU by _time | timechart span=1mon avg(DAU) as avg_DAU | eval Month_Name=strftime(_time,"%b") | table Month_Name avg_DAU
gives the wrong average.
base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1month avg(DAU) as avg_DAU, dc(PARTICIPANT_AOID) as MAU | eval Month_Name=strftime(_time,"%b") | Table Month_Name avg_DAU MAU DAU/MAU
gives the correct average, 0 for all MAU though
Give this a try. If bucket is the issue, this should work fine
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | eventstats dc(UserID) as dailyUniq by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness
Else, give this a try as well
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eventstats dc(UserID) as dailyUniq by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness
still nothing, both return the same avg dailyUniq similar to the previous queries, although interestingly enough they are different from each other by varying values between 1-10 where where the value is equal to ~2000
base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1mon avg(DAU) as avg_DAU | eval Month = strftime(_time,"%b") | table Month avg_DAU
is the only query that returns the correct avg_DAU so far, is there some way to do it using 2 timecharts that will also get me the Unique monthly users?
It seems like the eventstats is the one causing problem. It could be due to too many rows for processing? Was the 2nd query closer to the actual average?
Give this last one a try else, we'll have two timechart method
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness
That worked! I dont remember which query was closer btw but they were only different by say 5 so one query returned avg dailyUser=2100 and other would return 2105 while the real value was 1800.
Do you by any chance know how I can organize it by months rather than alphabetically or should I manually sort it with something like case? So if I ran it now in August it would give me in order
July, June, May, April, March,Feb, Jan,Dec 2015...
Try this
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%m %B %Y") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness| eval Month=mvindex(split(Month," "),1)
Can't view any of my posts or your posts past the suggestion with eval Month=mvindex outside of my email.
I get this error when trying your regex.
"Error in 'rex' command: Encountered the following error while compiling the regex '(?<\d+)\s(?<\w+)': Regex: syntax error in subpattern name (missing terminator)"
This actually removed all of the Months, the MOnth column is now blank
How about this?
base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID
| eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness | rex field=Month "(?<sortfield>\d+)\s(?<Month>\w+)" | sort num(sortfield) | field - sortfield