Splunk Search

Averaging Out Count In Graph

sajbutler
Path Finder

Every 5 minutes, one of our systems dumps out data on connected users. There is one line per connected user as follows:

In one 5 minute period:

23.06.2010 10:45:00,421,00004794,AKL1D00318,10.18.09,1,GUI,3,
23.06.2010 10:45:00,421,00004795,WLG1D00037,LT12,10.42.11,1,GUI,15
23.06.2010 10:45:00,421,00004796,WLG1D00029,LT12,10.36.35,1,GUI,4
23.06.2010 10:45:00,421,00004799,AKL1D00367,VL03N,09.56.57,2,GUI,882
23.06.2010 10:45:00,421,00004825,syd1sap05.ce.c#,10.36.49,1,RFC,1,
23.06.2010 10:45:00,421,00004826,akl1l00246,XD03,10.34.35,1,GUI,10
23.06.2010 10:45:00,421,00004840,AKL1D00392,10.22.48,1,RFC,1,
23.06.2010 10:45:00,421,00004855,AKL1D00401,10.44.29,1,GUI,23,

The next 5 minute period:

23.06.2010 10:50:00,421,00004794,AKL1D00318,10.18.09,1,GUI,3,
23.06.2010 10:50:00,421,00004794,AKL1D00318,10.33.23,2,GUI,5,
23.06.2010 10:50:00,421,00004795,WLG1D00037,LT12,10.48.49,1,GUI,15
23.06.2010 10:50:00,421,00004796,WLG1D00029,LT12,10.48.21,1,GUI,4
23.06.2010 10:50:00,421,00004799,AKL1D00367,VL03N,09.56.57,2,GUI,882
23.06.2010 10:50:00,421,00004825,syd1sap05.ce.c#,10.36.49,1,RFC,1,
23.06.2010 10:50:00,421,00004826,akl1l00246,XD03,10.34.35,1,GUI,10
23.06.2010 10:50:00,421,00004855,AKL1D00401,10.44.29,1,GUI,23,

I want to plot concurrent users over time. If I use the following, it works.

* | timechart span=5m count

However, if I want to plot this over a longer period of time where span=1h, for example, then the concurrent users are multiplied by 12 (i.e. each 5 minute count is aggregated).

Is there any way to "average out" my count so that it is aware that the span is no longer 5 minutes and to average accordingly?

Tags (1)
0 Karma

blilburne
Explorer

You can also do it by timecharting twice, first with the interval you want to count at, then the interval you want to measure at:

... | timechart span=1m count as minute_count | timechart avg(minute_count) as per_minute

Stephen_Sorkin
Splunk Employee
Splunk Employee

Timechart has a couple of helpful time-based aggregators: per_second(), per_minute(), per_hour() that sum the field in parenthesis and divide by the number of seconds, minutes or hours in each timechart bin.

Since this requires something to count, you could write your search as:

... | eval weight = 5 | timechart per_minute(weight)

Alternately you could do after-the-timechart processing on the _span field added by timechart like the following, which normalizes timechart buckets that are larger than 5 minutes (300s):

... | timechart count | eval count = count*300/_span

jrodman
Splunk Employee
Splunk Employee

I think you want to find the number of distinct users within each time window?

If so,

* |timechart span=whatever dc(user)

where user is an extracted field representing the specific login.

jrodman
Splunk Employee
Splunk Employee

The other approach is somehow find the length of each session -- by transaction search or otherwise -- and sum the number of seconds of each session, and then compare that to the number of seconds in your time window, which should give you an average concurrency.

0 Karma

jrodman
Splunk Employee
Splunk Employee

It sounds like you want something fairly complex. You want to get the number of sessions open at a given point in time. But then you want to find out the average of those numbers. A number that requires an instantaneous check averaged over a period of time requires an infinite number of samplings, or the acceptance of some quantization inaccuracy. The span is your quantization, so if you have a field representing the session ID, you could just distinct-count those over a 5m span or something to get your answer.

0 Karma

sideview
SplunkTrust
SplunkTrust

but the dc(user) will only count the distinct users. count(users) would count the number of events that have any user value at all, but dc(users) will only count up the distinct values per bucket.

0 Karma

sajbutler
Path Finder

Almost. It is a common situation where a user will have more than one session open to the application. So if a user has 3 concurrent sessions open to the application, we want to be able to record those 3 sessions. What I am after is the average concurrent sessions for the particular time span.

If 80 users log on at 9am and log off at 9:59am, there will be 960 (i.e. 80x12) lines generated
9:00 - 80 lines
9:05 - 80 lines
snip
9:55 - 80 lines
10:00 - 0 lines

If I do a * | timechart span=1h count, It will return 960. Where really, the average concurrent users was 80 (i.e. 960 divided by 12)

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...