Splunk Search

Search for user day 1 retention

arnol229
Explorer

I'm taking a shot at providing metrics on day 1 retention numbers of users in our system (Create a profile and the next day use server again).

I can grab a list of users and the _time they created their profile easily:

sourcetype=Profile event=profile__create | table user _time

I have a general server_log response event that will capture user activity:

sourcetype=server_log event=RESPONSE | bucket _time span=1d | stats values(user) by _time

So, how can I take the profile creation events, look forward a day of the event and see if there was activity by that user?
I've been working in this direction with a subsearch, but have unable to complete the query...

sourcetype=server_log event=RESPONSE [search sourcetype=Profile event=profile__create | table user _time] | bucket _time span=1d | stats values(user) by _time

Any advice/direction is greatly appreciated.


EDIT

Sorry, I realize this was poorly worded.

My expected result is to get the % of users who come back to our site the next day after creating a profile. Daily/Weekly/Monthly granularity.
EX:
User comes to the site on Monday, Jan. 1st and creates a profile.
This fires a profile_create event with the user's id in the field __user_.
The next day, we see a response event in the server_logs, so this user was retained on day 1.
If one hundred people create profiles in a day/week/month, what is the % of users that come back on day 1? (timechart or stats)

Thanks again for your input!

0 Karma

arnol229
Explorer

Worked out a solution, with big thanks to somesoni2 and sundareshr for their help.

sourcetype=Profile event=profile__create | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d") | stats count by _time | rename count as total_count 
| join type=left _time [search sourcetype=Profile event=profile__create | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d") | fields _time user 

| join type=inner _time user [search sourcetype=server_log event=RESPONSE | eval mytime=strftime(_time, "%Y%m%d") | eval _time= strptime(mytime, "%Y%m%d")-86400 | dedup _time user | fields _time user] | stats count by _time | rename count AS "day1"] 

| eval perc=day1/total_count | bucket _time span=$granularity$ | fillnull day1 perc value=0 | stats sum(total_count) AS "Total Profiles Created" sum(day1) AS "User's that returned on day 1" avg(perc) AS "% of Users that returned on day 1" by _time

Couldn't find the best way to format it for readability, so I apologize for that mess. Basically,
1. I search and get all profile create events and reconfigure the _time attribute to snap to the basic date
2. I left join on _time with a big subsearch that will return only events that match (inner join) the user and _time of a profile creation AND a server response that gets booted back to the previous day (the day the profile was created).
3. Now i have a table of _time by dates, total_count of created profiles, and count of events that match activity the day after the user created a profile
4. I bucket this time on a span that can be configured (1d,7d,1mon) in a dashboard, fill null counts with 0, and table things out in stats

Probably unconventional and unoptimized. for a last 30 day search, it takes about 10 - 15 seconds for the query to crunch. I'm sure i'll revisit this at a later date to refactor, but right now it fits my needs.

0 Karma

somesoni2
SplunkTrust
SplunkTrust

Give this a try as well (may perform better, but check if the results are same as your final query)

sourcetype=Profile event=profile__create | bucket span=1d _time | eventstats count as total_count by _time | fields _time user total_count 
| join type=inner _time user [search sourcetype=server_log event=RESPONSE | bucket span=1d _time | dedup _time user | fields _time user event | eval _time=relative_time(_time,"-1d") ] 
| eventstats count(eval(event="RESPONSE")) a day1 by _time | table _time user total_count day1 
| eval perc=day1/total_count | bucket _time span=$granularity$ | fillnull day1 perc value=0 | stats sum(total_count) AS "Total Profiles Created" sum(day1) AS "User's that returned on day 1" avg(perc) AS "% of Users that returned on day 1" by _time
0 Karma

somesoni2
SplunkTrust
SplunkTrust

Not sure If I understood the requirement completely. Are you looking for

1) find users created in Day1
2) Go to Day2 and find if those users identified in step 1 have any activity.

If that is correct, try this,

(sourcetype=server_log event=RESPONSE) OR (sourcetype=Profile event=profile__create) |bucket _time span=1d  | chart count(eval(sourcetype="server_log")) as Activity count(eval(sourcetype="Profile")  as Created by user _time | streamstats current=f window=1 values(Created) as prev by user | eval Result=if(prev>0 AND Activity>0,"Created and activity next day","")

sundareshr
Legend

Try something like this

sourcetype=server_log event=RESPONSE [search sourcetype=Profile event=profile__create | head 1 |  eval earliest=_time+3600 | latest=now() | table earliest latest]
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...