Splunk Search

Why is streamstats resetting incorrectly twice in the dataset when it should not be?

DigitalBibleSoc
New Member

Hey all, we are having a bit of trouble with the streamstats command, as the title indicates. The following code returns an uninterrupted descending count in the variable "test", when the reset_before param checks for a value which never can exist ("bar"):

| from datamodel:"cdnlogic" | streamstats global=f window=2 current=t latest(_time) AS latest_time earliest(_time) AS earliest_time BY logic_uid | eval logic_time_difference = 'latest_time' - 'earliest_time' | eval logic_time_difference = if('logic_time_difference' >= 1200, "foo", 'logic_time_difference') | streamstats global=f window=0 current=t reset_before=("logic_time_difference == \"bar\"") count AS test BY logic_uid  | table logic_time_difference,test

returning:

diff   test
0   1
961 2
5   3
418 4
407 5
20  6
23  7
1   8
foo 9
1   10
1   11
8   12
1   13
3   14
3   15
1   16
3   17
1   18
6   19
1   20 

However, when reset_before is changed to look for "foo" instead of "bar", it resets the count incorrectly twice in the dataset I am testing with:

| from datamodel:"cdnlogic" | streamstats global=f window=2 current=t latest(_time) AS latest_time earliest(_time) AS earliest_time BY logic_uid | eval logic_time_difference = 'latest_time' - 'earliest_time' | eval logic_time_difference = if('logic_time_difference' >= 1200, "foo", 'logic_time_difference') | streamstats global=f window=0 current=t reset_before=("logic_time_difference == \"foo\"") count AS test BY logic_uid  | table logic_time_difference,test

returns:

diff   test
0   1
961 1 <-- ?
5   2
418 3
407 1 <-- ?
20  2
23  3
1   4
foo 1 <-- good
1    2
1   3
8   4
1   5
3   6
3   7
1   8
3   9
1   10
6   11
1   12 

Any guidance on this issue, as well as consolidating the statement, is welcome.

0 Karma

493669
Super Champion

It seems due to logic_uid it getting reset abruptly, so try streamstats without BY clause

| from datamodel:"cdnlogic" | streamstats global=f window=2 current=t latest(_time) AS latest_time earliest(_time) AS earliest_time BY logic_uid | eval logic_time_difference = 'latest_time' - 'earliest_time' | eval logic_time_difference = if('logic_time_difference' >= 1200, "foo", 'logic_time_difference') | streamstats global=f window=0 current=t reset_before=("logic_time_difference == \"foo\"") count AS test   | table logic_time_difference,test
0 Karma

DigitalBibleSoc
New Member

Unfortunately, that causes it to count the instances of ALL "logic_uid"'s together:

diff    test
0      3
961    13
5      14
418 22
407 4
20  5
23  6
1   7
foo 1
1   2
1   3
8   4
1   5
3   6
3   7
1   8
3   10
1   11
6   12
1   13

Ideally, a single streamstats statement would make the most sense. Sadly, we haven't been able to figure out a way.

0 Karma

493669
Super Champion
0 Karma

DigitalBibleSoc
New Member

Okay, that answer seems specific to counts. The actual logic we are attempting to accomplish against the dataset is not as simple as a count. In order to make the question easier to illustrate, I changed the more complex (and assumed irrelevant) logic to a count so the table output would make sense. In actuality, we are taking a sum('logic_time_difference') AS logic_time_sum up to the reset point, AND taking the latest('logic_nid') AS logic_nid, replacing the existing 'logic_nid' field of the events inside the series.

0 Karma

DigitalBibleSoc
New Member

We did take a look at that before posting this. I'll personally take a closer look at it now/tomorrow and mark as answer if that's the key. Thanks!

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...