Getting Data In

Keep Last Daily Activity Per User

HeinzWaescher
Motivator

Hi,

i would like to keep the last event/activity of a day for every user (so filter out all other events).

I used this command:

bucket _time span=1d | dedup user, _time | timechart span=1d dc(user)

This is just an testexample I tried out. The same result should appear, when I only use | timechart span=1d dc(user). But there a very small differences.

Is there a better function that can be used here?

Thanks in advance

Heinz

Tags (2)
0 Karma

HeinzWaescher
Motivator

Hi everybody and thanks for your answers.

Unfortunately, I dont think it's possible to adjust your ideas to my search.
To avoid misunderstandings, i will post my whole search and try to explain the goals.

What i want to achieve is a timechart, which shows the daily active (unique) users and the average age of these users on that day. The age (in days) of user A should be defined in every event as the timespan between his current eventtimestamp and his first activity in the dataset. The first activity (in seconds) of each user is defined in a lookuptable "first_activity.csv".

To achieve, that every user is only taken into account once per day to calculate the average age ,i just want to use the age of each users last daily event.

So what I've done is this:

sourcetype=... | lookup first_activity.csv user OUTPUT firstactivity | eval event_time= eventtime/1000 | eval age=floor((event_time-first_activity)/3600/24)+1 | bucket _time span=1d| dedup user,_time| timechart span=1d dc(user) AS Uniqe_Users,avg(age) AS Avg_Age | eval Avg_Age=round(Avg_Age,0)

Best
Heinz

0 Karma

HeinzWaescher
Motivator

Ich habe gedacht die Zeitstempel werden flachgeklopft, aber die Reihenfolge der Events bleibt bestehen. Entsprechend würde durch "dedup" dann das letzte gewählt. (Falls nicht, wäre das jetzt auch nicht sooo dramatisch. Hauptsache es bleibt erstmal ein einziges pro Tag übrig)

In der Lookuptabelle steht der timestamp der "first activity" in sekunden. Das kann ja aber keinen Einfluss auf die Unique_Users haben.

Das avg_age wird sich durch "dedup" ändern, da pro User nur noch ein age verwendet wird.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Moin Heinz,

die Kombination von bucket und dedup gibt keine Garantien darüber, welches Event behalten wird. Schließlich klopft das bucket alle Zeitstempel flach, bevor aussortiert wird.

Am dc(user) sollte das in der Tat nichts ändern. Bei avg(age) habe ich aber Bedenken - ist pro User und Tag vor bucket das age identisch? Falls in der Lookup-Tabelle "krumme" Daten stehen, kann es untertägig den Sprung von einem Tag auf den nächsten geben.

Viele Grüße aus Kiel,
Martin

For the English speakers: In principle the issue is solved with HeinzWaescher's query.

0 Karma

HeinzWaescher
Motivator

Hi Martin,

im Grunde tut die Suche was ich möchte. Ich habe ein paar Stichproben für einzelne User gemacht. Allerdings scheint mein Ansatz "das letzte Event des Tages pro User" zu behalten" nicht optimal zu sein. Wenn ich

"| bucket _time span=1d| dedup user,_time"

aus meiner Suche entferne, dürfte sich ja eigentlich nichts an der Anzahl der Unique_Users pro Tag ändern. Aber es kommt zu vereinzelnten (sehr geringen) Veränderungen.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

So... does your search now solve your problem? I'm not quite sure if I understand you correctly in that regard 😞

My offer for German-language help still stands if necessary, even beyond this single question.

0 Karma

kristian_kolb
Ultra Champion

While the answers below seem ok, you already have something similar; perhaps you could expand this with more fields;

... | bucket _time span=1d | dedup user, _time, sourcetype, host

if you want have events on a per sourcetype/host basis.

..and dc(user) will not produce anything near what the title of the question says.

/k

0 Karma

somesoni2
SplunkTrust
SplunkTrust

You can try the following

"base search"| stats last(*) as * by user,date_mday

This should give you all the fields from the last event for each user for each day.

martin_mueller
SplunkTrust
SplunkTrust

I'm not sure why you're trying to answer "last daily activity" with a distinct count... here's a thought:

index=_internal earliest=-h@h latest=@h | timechart span=15m first(_raw) by sourcetype

This looks at an hour of _internal data and gives you the last activity per sourcetype. I'm sure you can adapt this for your specific needs.

PS: If you need German-language help (judging by your name) we can certainly work something out.

Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...