Splunk Search

Cannot figure out how to do bucketing in Splunk using return values from previous row

Avaclon
New Member

Been at this for a day or two. Can't figure it out.

I have a list of efforts in a column [1 2 3 2 1 3 1 1 1 5]
I have a list of available working hours per Sprint [5 3 3 6 2 2 2]

I want to be able to generate something like
Effort | Sprint
1 | 1
2 | 1
3 | 2
2 | 3
1 | 3
3 | 4
1 | 4
1 | 4
1 | 4
5 | 7

I would do this in DB with a temp table + Cursor, but that doesn't seem to work in Splunk.

I think it has something to do with with the streamstats, but I can't wrap my head around how / where to store the temporary variable, and then distributing properly. Key here is that if it doesn't fit into the bucket, it should abandon the rest of the values remaining in the bucket. Basically, that item #4 needs to be smart to be in Sprint 3, not in Sprint 2, so the sum / dividing would work.

Any help is much appreciated!

Tags (2)
0 Karma

DalJeanis
Legend

You need to be comparing the sums up to that point, rather than the individual effort and work numbers. To know which sprint an item would be finished in, the net total work hours available must be the lowest number greater than the total effort required.

The obvious way to do this is with a cross join... which should be discouraged for large implementations...

Here's a run-anywhere sample to show you what we mean...

| makeresults 
| fields - _time
| eval efforts="1 2 3 2 1 3 1 1 1 5" 
| makemv efforts
| mvexpand efforts
| rename COMMENT as "The above produces the individual effort records" 

| rename COMMENT as "This gives them each a record number, and creates a running total." 
| streamstats count as eff_recno sum(efforts) as sumOfEfforts

| rename COMMENT as "Set a junk field to create a cross join." 
| eval junk=1
| join junk max=1000 [
    | makeresults 
    | eval workhours="5 3 3 6 2 2 2" 
    | makemv workhours
    | mvexpand workhours
    | rename COMMENT as "The above produces the individual work records" 


    | rename COMMENT as "This gives them each a record number, and creates a running total." 
    | streamstats count as work_recno sum(workhours) as sumOfWorkHours

    | rename COMMENT as "Set a junk field to create a cross join." 
    | eval junk=1
    ]

| rename COMMENT as "Kill fields where the effort required is greater than the amount available." 
| where sumOfEfforts<sumOfWorkHours

| rename COMMENT as "Accept only the first record where work is greater than the effort required." 
| dedup eff_recno
0 Karma

Richfez
SplunkTrust
SplunkTrust

Could you better describe how Efforts relate to Sprints? How would you know if it would or would not fit into a bucket? What's a bucket in this case? Is this, like, what, Agile stuff? Scrums and things?

It may help to explain what all this is in a way that we simple-minded folk who aren't "Agile-fluent" can understand.

🙂

Thanks,
Rich

0 Karma

Avaclon
New Member

Sorry for the confusion. Think of list 1 as size of things I need to put into buckets. List 2 is the size of the buckets. The order of buckets needs to be kept (as in you must fill out bucket 1 before bucket 2). If bucket is too small, you can combine bucket (they can be magical buckets), and we can count them as sum of their sizes, which is why you see the last item in list 1, which a size of 5, going into bucket 7, because bucket 5 & 6 is only size 2, so you need to combine bucket 5+6+7 to fit that last item.

Hope this clarifies it?

0 Karma

lguinn2
Legend

I can't understand your question. I see the two lists: one list is effort and the other list is hours per sprint.
But what are you calculating in the result table? I see that the left column is the equivalent of the effort list. But how in the world is the right column (Sprint) calculated? What do you mean by bucket? How do I identify the composition of a bucket?

0 Karma

Avaclon
New Member

Sorry for the confusion. The 2nd list [5 3 3 6 2 2 2] is basically the bucket size. Bucket 1 size is 5, bucket 2 size is 3, etc etc. We're putting things from list 1 into the buckets. So item 1 & 2 fit into bucket 1, item 3 can only fit into bucket 2 because there's no more room in bucket 1, and the last item, item 5, is too big for bucket 5,6, so it requires combination of 3 buckets to put it into, thus bucket 7.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...