Splunk Search

How do I get Unique users per day and per month in one query and then divide them

pzhou07920
Explorer

Hi,

I have a field called "UserID" and a DateActive field. I'm looking to make a bar chart where each bar has a value equal to the average # of unique users per day in a month divided by the total # of active users of that month, for every month in the year (Lets call this value Stickiness). For example, if the month of Jan averages 3,000 unique active users per day and has 10,000 unique active users in the entire month of Jan, then I want the Stickiness for Jan to be 3,000/10,000 or .3.

I'm able to get both unique active users per day and unique active users per month in separate queries but am having trouble doing it together to do the division since I am using

timechart span=1(day or month) dc(UserID) and using eval _time=DateModified

So how can I obtain both the unique daily active user and unique monthly active user counts in a single query so that I can perform division on them? I also want to display the data using a bar graph that has one bar for the Stickiness of each month in a 12 month window? I'd normally use timechart but i'm not sure how for a more complex situation like this.

Might have worded it poorly, let me know so I can clarify.

Tags (1)
0 Karma
1 Solution

somesoni2
SplunkTrust
SplunkTrust

Give this a try

your base search | bucket span=1d _time | stats dc(UserID) as dailyUniq count(UserID) as DailyActive by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers sum(DailyActive) as TotalActive | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/TotalActive,2) | table Month Stickiness

Update#1
This should givev stickiness as ratio of Daily Average Uniq Users and Unique users for the month. Will be little slower than previous one.

your base search | bucket span=1d _time | eventstats dc(UserID) as dailyUniq by _time
| timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness

View solution in original post

somesoni2
SplunkTrust
SplunkTrust

Not sure where the other posts/comments are gone from both you and me. Could've reached the limit on number of comments under one answer, so starting a new answer.

Try this

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
  | eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
  | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness| eval Month=mvindex(split(Month," "),-1)

OR

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
      | eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
      | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness | rex field=Month "(?<sortfield>\d+)\s(?<Month>\w+)" | sort num(sortfield) | fields - sortfield
0 Karma

pzhou07920
Explorer

They both order it the same- alphabetically. Perhaps a 2 timechart method would be better since it would automatically order it? I"m looking for the data from the last 12 months, aka from now (August) to Sept of 2015 sorted chronologically

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Hmmm. So try this. Since it's using timechart, the order should be in time chronological order.

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID | timechart span=1mon count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth  | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | eval Month=strftime(_time,"%B")| table Month Stickiness 
0 Karma

pzhou07920
Explorer

weird, so the values im getting are way off and i'm comparing the denom and numerator to the working queries denom and numerator. Denom in the timechart one is always 1, and numerator is wrong too, much smaller. Difficult problem haha, one fix causes another issue.

I can do some kind of manual sort where I check the current date using now() and then use that to order the previous months and check to see that the year is either same as now() or -1 if the month number is greater than the current month

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Grrrr.. it's getting a lil frustrating.... Try this (high hope with this)

  base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
       | eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
       | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness  | eval sortfield=strptime(Month,"%Y%m %B") | sort num(sortfield) | eval Month=strftime(sortfield,"%B") | table Month Stickiness
0 Karma

pzhou07920
Explorer

Hey that worked! There was one small thing where the "Month" field wasn't displaying despite the Stickiness being sorted in the correct order, I had to make a small change but otherwise it worked perfectly. I changed the end of the query to be like this, for some reason this part

eval Month=strftime(sortfield,"%B")

wasn't working so I had to do use a regex instead.

| eval sortfield=strptime(Month,"%Y%m %b") | sort num(sortfield) | rex field=Month "(?<Month>[A-Z][a-z][a-z])" | table Month Stickiness

Thanks again, I know this was kinda a frustrating problem.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try

your base search | bucket span=1d _time | stats dc(UserID) as dailyUniq count(UserID) as DailyActive by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers sum(DailyActive) as TotalActive | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/TotalActive,2) | table Month Stickiness

Update#1
This should givev stickiness as ratio of Daily Average Uniq Users and Unique users for the month. Will be little slower than previous one.

your base search | bucket span=1d _time | eventstats dc(UserID) as dailyUniq by _time
| timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness

pzhou07920
Explorer

Thanks, there is one thing. Instead of TotalActive I actually wanted Monthy Unique Users, essentailly the same thing as the dailyUniq you calculated except for each month. Not sure how to do that in addition to getting the dailyUniq as you described

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Try the updated answer

0 Karma

pzhou07920
Explorer

It's really close, for some reason the average unique daily users is off by a couple hundred though. I removed your search and replaced eventstats with stats to look like this:

base search | bucket span=1d _time | stats dc(UserID) as dailyUniq by _time | eval Month=strftime(_time,"%b") | where Month="May"

and got the correct unique daily users for each day in the month of May. Somehow though the average isnt calculated correctly which is confusing me, for instance in May I'm getting avg=~2100 when it should be ~1800.

Full code below - PARTICIPANT_AOID is UserID

eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | bucket span=1d _time | eventstats dc(PARTICIPANT_AOID) as DAU by _time | timechart span=1month avg(DAU) as avg_DAU, dc(PARTICIPANT_AOID) as MAU  | eval Month=strftime(_time,"%m") | eval Month_Name = strftime(_time,"%b") | eval Year=strftime(_time,"%Y") | eval CurrDate = now() | eval CurrMonth=strftime(CurrDate,"%m") | eval CurrYear=strftime(CurrDate,"%Y") | where (Year=CurrYear) OR (Year=(CurrYear-1) AND (Month > CurrMonth)) | eval DAU/MAU=round(avg_DAU/MAU,2) | Table Month_Name DAU/MAU avg_DAU MAU

most of the time stuff is just to get the correct time windows because of how we indexed the data

edit: just realized i could have used earliest and made it a lot easier but that shouldn't be the issue

0 Karma

somesoni2
SplunkTrust
SplunkTrust

What is the time range for the search? The time-range works on the _time which splunk assigned during indexing. Here you're overriding the _time from field LastModifiedDate, so it could be possible that some of the events are not included in the base search itself. Can you try this and see if the average is correct?

your base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1mon avg(DAU) as avg_DAU
0 Karma

pzhou07920
Explorer

Do you know what is wrong? I plaeyd around some more and it seems to have to do with the bucket/event stats part.

base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | bucket span=1d _time | eventstats dc(PARTICIPANT_AOID) as DAU by _time | timechart span=1mon avg(DAU) as avg_DAU | eval Month_Name=strftime(_time,"%b") | table Month_Name avg_DAU

gives the wrong average.

base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1month avg(DAU) as avg_DAU, dc(PARTICIPANT_AOID) as MAU | eval Month_Name=strftime(_time,"%b") | Table Month_Name avg_DAU MAU DAU/MAU

gives the correct average, 0 for all MAU though

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try. If bucket is the issue, this should work fine

base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | eventstats dc(UserID) as dailyUniq by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness

Else, give this a try as well

base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
| eventstats dc(UserID) as dailyUniq by _time | timechart span=1mon avg(dailyUniq) as DailyAvgUniqUsers dc(UserID) as MonthlyUniq | eval Month=strftime(_time,"%B") | eval Stickiness=round(DailyAvgUniqUsers/MonthlyUniq,2) | table Month Stickiness
0 Karma

pzhou07920
Explorer

still nothing, both return the same avg dailyUniq similar to the previous queries, although interestingly enough they are different from each other by varying values between 1-10 where where the value is equal to ~2000

base search | eval _time = strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N") | timechart span=1d dc(PARTICIPANT_AOID) as DAU | timechart span=1mon avg(DAU) as avg_DAU | eval Month = strftime(_time,"%b") | table Month avg_DAU

is the only query that returns the correct avg_DAU so far, is there some way to do it using 2 timecharts that will also get me the Unique monthly users?

0 Karma

somesoni2
SplunkTrust
SplunkTrust

It seems like the eventstats is the one causing problem. It could be due to too many rows for processing? Was the 2nd query closer to the actual average?

Give this last one a try else, we'll have two timechart method

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
| eval Month=strftime(_time,"%B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
| eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness
0 Karma

pzhou07920
Explorer

That worked! I dont remember which query was closer btw but they were only different by say 5 so one query returned avg dailyUser=2100 and other would return 2105 while the real value was 1800.

Do you by any chance know how I can organize it by months rather than alphabetically or should I manually sort it with something like case? So if I ran it now in August it would give me in order
July, June, May, April, March,Feb, Jan,Dec 2015...

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Try this

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
 | eval Month=strftime(_time,"%m %B %Y") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
 | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness| eval Month=mvindex(split(Month," "),1)

pzhou07920
Explorer

Can't view any of my posts or your posts past the suggestion with eval Month=mvindex outside of my email.

I get this error when trying your regex.
"Error in 'rex' command: Encountered the following error while compiling the regex '(?&lt\d+)\s(?<\w+)': Regex: syntax error in subpattern name (missing terminator)"

0 Karma

pzhou07920
Explorer

This actually removed all of the Months, the MOnth column is now blank

0 Karma

somesoni2
SplunkTrust
SplunkTrust

How about this?

 base search | eval _time = relative_time(strptime(LastModifiedDate, "%Y-%m-%d %H:%M:%S.%3N"),"@d") | stats count by _time UserID 
  | eval Month=strftime(_time,"%Y%m %B") | stats count(_time) as numerator dc(_time) as denom dc(UserID) as UniqMonth by Month
  | eval Stickiness=round(numerator/denom/MonthlyUniq,2) | table Month Stickiness | rex field=Month "(?<sortfield>\d+)\s(?<Month>\w+)" | sort num(sortfield) | field - sortfield
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...