Splunk Search

Need help with Predict command to forecast receipts by customer

fmcg
New Member

Hi everyone,

I use Splunk to assign transactions on daily bank statements to Category (eg receipts, payroll etc) and Client ( for receipts Company A, Company B etc).

I have transaction history for the past year and a half and was hoping to use the Predict function to be able to forecast receipts by Client. For example, Company A will pay us on 21st of each month.

I'm new to Splunk so not sure where to start (which screen to use and what syntax), please help!
Thanks!

Faina

0 Karma

DalJeanis
Legend

Predict and trend are not appropriate to this data unless you have quite a bit more than you are telling us. Ideally, you need to know when you billed the client. Do you have that info?

In order to do this properly, you would need the following info for each of the pieces of your billing. That would require access to your actual billing system, for each of the items that made up the bill that was paid, not just the bank statements.

Client ServiceDate ServiceAmount BillingDate BillingAmount ReceivedDate ReceivedAmount

Given that information, you should be able to predict, for each item that has been billed and not yet paid, how long it will take to receive the payment.

On the other hand, if you ONLY have the amount received, you probably aren't going to be able to get a very good prediction.

But, just for fun, let's try.

Suppose that a client tends to pay pretty close to the 15th of the month. All you would need to do then, would be to take the average and standard deviation for the payment data and payment amount, and you're done.

your search that gets the _date, amount, and Client
| eval DayOfMonth=tonumber(strftime(_time,"%d"))
| stats avg(DayOfMonth) as AvgDay stdev(DayOfMonth) as StdevDay by Client

This gives you the expected date, and the variability of that date.

Unfortunately, if someone tends to pay at the end of the month and the start of the next, then average will erroneously drop it into the middle of the month. So, you need to figure out a way to move it forward and back 7-10 days to check for cyclic issues. We'll use 9 days for the example, but you may want to play with values from 6 to 11 days to see what is appropriate to your clients.

your search that gets the _date, amount, and Client
| eval Day0=tonumber(strftime(relative_time(_time,"-9d"),"%d"))
| eval Day1=tonumber(strftime(_time,"%d"))
| eval Day2=tonumber(strftime(relative_time(_time,"+9d"),"%d"))
| stats avg(Day0) as Avg0, stdev(Day0) as Stdev0, avg(Day1) as AvgDay1, stdev(Day1) as Stdev1, avg(Day2) as Avg2, stdev(Day2) as Stdev2 by Client

Now, now we have three different averages and stdevs. Because month-end rollovers cause a HIGHER variability, we want the answer with the lowest stdev. We resolve the unlikely equal case in favor of the actual payment date. Notice that here we are moving the calculated average back to its actual day, but will later have to account for rollovers each way.

| eval AvgDay=if(Stdev1<Stdev0, if(Stdev1<Stdev2, Day1, Day2 - 9), if(Stdev0<Stdev2,Day0 + 9,Day2))
| eval StdevDay=if(Stdev1<Stdev0, if(Stdev1<Stdev2, Stdev1, Stdev2), if(Stdev0<Stdev2,Stdev0,Stdev2))

Okay, now what do we want to report? Assuming a normal curve, one stdev later than the expected day is the date that we expect 84% of the payments to have arrived. Two stdevs later is the date we expect 98% of payments to have arrived. Let's take 1.3 sd as our standard, which means 90% will arrive before that date, and let's make sure that an estimate of 31 or later is moved into the next month, and an estimate lower than 1 is moved into the prior month.

 | eval ProjDay = round(AvgDay + 1.3 * StdevDay)
 | eval ProjDay = if(ProjDay>30,ProjDay-30,if(ProjDay<1,ProjDay+30,ProjDay)

So, ProjDay is the day of the month that that client's payment is 90% likely to have been received.

Now, how much is the client likely to pay? Well, in that case, we're going to want to UNDER estimate it, so we'll take the -1 sd of the average MONTHLY payment amount, That looks like this...

your search that gets the _date, amount, and Client
| bin _time span=1mon
| stats sum(amount) as amount by Client _time
| stats avg(amount) as AmtAvg stdev(amount) as AmtStdev by Client
| eval ProjAmt = round(AmtAvg - 1*AmtStdev,0)

The above two searches can be worked all together

  your search that gets the _date, amount, and Client
| eval Day0=tonumber(strftime(relative_time(_time,"-9d"),"%d"))
| eval Day1=tonumber(strftime(_time,"%d"))
| eval Day2=tonumber(strftime(relative_time(_time,"+9d"),"%d"))
| appendpipe 
    [ | bin _time span=1mon  
      | stats sum(amount) as amount by Client _time
      | stats avg(amount) as AmtAvg stdev(amount) as AmtStdev by Client
      | eval ProjAmt = round(AmtAvg - 1*AmtStdev,0)
      | table Client AmtAvg AmtStdev ProjAmt
      ]
| stats avg(Day0) as Avg0, stdev(Day0) as Stdev0, avg(Day1) as AvgDay1, stdev(Day1) as Stdev1, avg(Day2) as Avg2, stdev(Day2) as Stdev2, avg(AmtAvg) as AmtAvg , avg(AmtStdev) as AmtStdev, avg(ProjAmt) as ProjAmt by Client
| eval AvgDay=if(Stdev1<Stdev0, if(Stdev1<Stdev2, Day1, Day2 - 9), if(Stdev0<Stdev2,Day0 + 9,Day2))
| eval StdevDay=if(Stdev1<Stdev0, if(Stdev1<Stdev2, Stdev1, Stdev2), if(Stdev0<Stdev2,Stdev0,Stdev2))
| eval ProjDay = round(AvgDay + 1.3 * StdevDay)
| eval ProjDay = if(ProjDay>30,ProjDay-30,if(ProjDay<1,ProjDay+30,ProjDay)
| table Client ProjDay ProjAmt AvgDay StdevDay AmtAvg AmtStdev
0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...