Cannot figure out how to do bucketing in Splunk us...

Avaclon · ‎03-28-2017

Been at this for a day or two. Can't figure it out.

I have a list of efforts in a column [1 2 3 2 1 3 1 1 1 5]
I have a list of available working hours per Sprint [5 3 3 6 2 2 2]

I want to be able to generate something like
Effort | Sprint
1 | 1
2 | 1
3 | 2
2 | 3
1 | 3
3 | 4
1 | 4
1 | 4
1 | 4
5 | 7

I would do this in DB with a temp table + Cursor, but that doesn't seem to work in Splunk.

I think it has something to do with with the streamstats, but I can't wrap my head around how / where to store the temporary variable, and then distributing properly. Key here is that if it doesn't fit into the bucket, it should abandon the rest of the values remaining in the bucket. Basically, that item #4 needs to be smart to be in Sprint 3, not in Sprint 2, so the sum / dividing would work.

Any help is much appreciated!

DalJeanis · ‎09-18-2018

You need to be comparing the sums up to that point, rather than the individual effort and work numbers. To know which sprint an item would be finished in, the net total work hours available must be the lowest number greater than the total effort required.

The obvious way to do this is with a cross join... which should be discouraged for large implementations...

Here's a run-anywhere sample to show you what we mean...

| makeresults 
| fields - _time
| eval efforts="1 2 3 2 1 3 1 1 1 5" 
| makemv efforts
| mvexpand efforts
| rename COMMENT as "The above produces the individual effort records" 

| rename COMMENT as "This gives them each a record number, and creates a running total." 
| streamstats count as eff_recno sum(efforts) as sumOfEfforts

| rename COMMENT as "Set a junk field to create a cross join." 
| eval junk=1
| join junk max=1000 [
    | makeresults 
    | eval workhours="5 3 3 6 2 2 2" 
    | makemv workhours
    | mvexpand workhours
    | rename COMMENT as "The above produces the individual work records" 


    | rename COMMENT as "This gives them each a record number, and creates a running total." 
    | streamstats count as work_recno sum(workhours) as sumOfWorkHours

    | rename COMMENT as "Set a junk field to create a cross join." 
    | eval junk=1
    ]

| rename COMMENT as "Kill fields where the effort required is greater than the amount available." 
| where sumOfEfforts<sumOfWorkHours

| rename COMMENT as "Accept only the first record where work is greater than the effort required." 
| dedup eff_recno

Richfez · ‎03-28-2017

Could you better describe how Efforts relate to Sprints? How would you know if it would or would not fit into a bucket? What's a bucket in this case? Is this, like, what, Agile stuff? Scrums and things?

It may help to explain what all this is in a way that we simple-minded folk who aren't "Agile-fluent" can understand.

🙂

Thanks,
Rich

Avaclon · ‎03-28-2017

Sorry for the confusion. Think of list 1 as size of things I need to put into buckets. List 2 is the size of the buckets. The order of buckets needs to be kept (as in you must fill out bucket 1 before bucket 2). If bucket is too small, you can combine bucket (they can be magical buckets), and we can count them as sum of their sizes, which is why you see the last item in list 1, which a size of 5, going into bucket 7, because bucket 5 & 6 is only size 2, so you need to combine bucket 5+6+7 to fit that last item.

Hope this clarifies it?

lguinn2 · ‎03-28-2017

I can't understand your question. I see the two lists: one list is effort and the other list is hours per sprint.
But what are you calculating in the result table? I see that the left column is the equivalent of the effort list. But how in the world is the right column (Sprint) calculated? What do you mean by bucket? How do I identify the composition of a bucket?

Avaclon · ‎03-28-2017

Sorry for the confusion. The 2nd list [5 3 3 6 2 2 2] is basically the bucket size. Bucket 1 size is 5, bucket 2 size is 3, etc etc. We're putting things from list 1 into the buckets. So item 1 & 2 fit into bucket 1, item 3 can only fit into bucket 2 because there's no more room in bucket 1, and the last item, item 5, is too big for bucket 5,6, so it requires combination of 3 buckets to put it into, thus bucket 7.

Cannot figure out how to do bucketing in Splunk using return values from previous row

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio