Comments and answers for "Calculate total bps from NetFlow data"
https://answers.splunk.com/answers/703214/calculate-total-bps-from-netflow-data.html
The latest comments and answers for the question "Calculate total bps from NetFlow data"Comment by to4kawa on to4kawa's comment
https://answers.splunk.com/comments/807100/view.html
sample log is only one. so I can't make query. will you provide 10~20 samples?Mon, 02 Mar 2020 21:21:41 GMTto4kawaComment by pestatp on pestatp's comment
https://answers.splunk.com/comments/807039/view.html
This still wouldn't provide a total bandwidth from multiple events, correct?
I took the original question to be asking how to get the total bandwidth for a particular time from multiple events with total bytes and a time range.Mon, 02 Mar 2020 14:02:10 GMTpestatpComment by to4kawa on to4kawa's comment
https://answers.splunk.com/comments/807029/view.html
| makeresults
| eval _raw="{\"22\":1543571246000,\"11\":443,\"12\":\"xxx.xxx.xxx.xxx\",\"23\":16550,\"24\":233,\"14\":0,\"57590\":91,\"1\":3209400,\"2\":2154,\"4\":6,\"5\":0,\"6\":27,\"7\":13726,\"8\":\"yyy.yyy.yyy.yyy\",\"57659\":\"FQDN of URL\",\"10\":0,\"21\":1543571253000}"
| spath
| rename 1 as IN_BYTES, 22 as FIRST_SWITCHED, 23 as OUT_BYTES, 21 as LAST_SWITCHED, 4 as PROTOCOL
| table LAST_SWITCHED, FIRST_SWITCHED, IN_BYTES, OUT_BYTES, PROTOCOL
| foreach *_SWITCHED
[ eval <<FIELD>>_p =strftime('<<FIELD>>'/1000 , "%F %T") ]
| eval in_bps = IN_BYTES * 8 / ((LAST_SWITCHED - FIRST_SWITCHED) /1000)
| eval out_bps = OUT_BYTES * 8 / ((LAST_SWITCHED - FIRST_SWITCHED) / 1000)
So far we have calculated.Mon, 02 Mar 2020 13:30:13 GMTto4kawaComment by pestatp
https://answers.splunk.com/comments/807026/view.html
Did you ever come up with a solution to this? I am attempting to solve the same problem and can't quite figure it out.
Unfortunately I don't have the option to use Kafka, so a SPL solution would be the best for me.Mon, 02 Mar 2020 12:52:19 GMTpestatpComment by masato_sekiguchi_aaj on masato_sekiguchi_aaj's comment
https://answers.splunk.com/comments/704437/view.html
Thanks for your feedback.
The idea looks good although I am not sure how I can break up 1 json flow to multiple place holders. And I am also wondering if the suggested solution would be practical in the environment which have more than 1,000 flows per sec.
In my case, long-lived session will have 120 sec timespan. We need to have 120,000 place holders if we have 1,000 of such long-lived session.
I am currently evaluating kafka. NetFlow data is sent to kafka from the flow generator, and Splunk reads it via Kafka connect for Splunk.
It might be much easier to analyze flow data in Kafka layer to derive bps, pps, or other statistics, create new topic for statistics, and read that statistics from Splunk for visualization.
Although I am new to Kafka, I will also try that path and see which would be practical.Tue, 04 Dec 2018 01:39:21 GMTmasato_sekiguchi_aajComment by rich7177 on rich7177's comment
https://answers.splunk.com/comments/704388/view.html
Ah, OK, so have it use the average of the bit rate over the whole duration, only per second (or whatever - spread it out).
This is more difficult. It can be done, I have an example of something similar but it's a wee bit nasty and very specific to a certain type of data, so it'll take a bit of work before it's even ready to put here.
First, though - your particular question you didn't quite ask - you can use LAST_SWITCHED as the time if you want. Just assign it like `index=netflow | eval _time = LAST_SWITCHED | bin _time ...` and continue as you have. And right, this works reasonably well as long as your minimum time span you will be plotting is significantly longer than the longest time span in your data, but there's still a lot of boundary effects.
So, to solve the more general problem - the idea is to compute the average rate, via one of a couple of methods break that up into second-by-second values, then finally do your final summations.
So using placeholder numbers, let's have one flow have a start of 5 and an end of 10 (including 10) and a total of 120, another is start 7 end 9 flow 90 you'd average...
First one to 20 per unit of time, spread that into `5, 20`, `6, 20`, `7, 20`, `8, 20`, `9, 20`, `10, 20`.
Second one to 30 per unit of time, spread into `7, 30`, `8,30`, `9,30`.
Then re-add that back up per second, `5, 20`, `6, 20`, `7, 50`, `8, 50`, `9, 50`, `10, 20`. And tada, that's really the numbers you need to work with.
As I said, I have some working code for a different problem that I think I can bend to doing this task so hopefully I can get a chance to sort it out in the next day or two and get a copy posted here.
I can fake up some close-enough data easily enough, and I should have time a little later this week to really dig through the example I have and convert it to this task. Someone else might jump in, but I think if you can wait a bit I'll have something worked up.Mon, 03 Dec 2018 13:51:55 GMTrich7177Comment by masato_sekiguchi_aaj on masato_sekiguchi_aaj's comment
https://answers.splunk.com/comments/704341/view.html
Please ignore msec or sec. It is not important here.
IN_BYTES is total received bytes during FIRST_SWITCHED and LAST_SWITCHED.
I can derive the average bps by the formula I wrote, but when I think about drawing bps graph, the graph should have this average bps value starts from FIRST_SWITCHED and end at LAST_SWITCHED.
If I use SPL I wrote, the bps value of the flow only appears at _time in the graph.
It does not take into consideration of the session timespan, FIRST_SWITCHED and LAST_SWITCHED.
From mathematics view point, bps of the flow can be certain function over time. I use fx(t) in here for the flow, x. IN_BPS for flow x will be the integral of fx(t) from FIRST_SWITCHED and LAST_SWITHCED.
IN_BPS x = integrate.quad (lambda t , fx(t), FIRST_SWITCHED, LAST_SWITCHED)
To derive the total bandwidth at particular time, t1, we need to calculate below.
total_bps at t1 = f1(t1) + f2(t1) + ... + fx(t1)
timechart span=5min sum(in_bps) does not do this calculation.
timechart just use _time value and derive sum of all bps value without considering FIRST_SWTICHED and LAST_SWITCHED.
In our case, 50% of flow is completed less than a second, and the longest flow have 120sec.
So just deriving 5min average by IN_BYTES and OUT_BYTES would be good enough although it does not consider LAST_SWITCHED and FIRST_SWITCHED. If I use LAST_SWITCHED for _time, the result may be more accurate.
index=netflow | bin _time span=5min | stats sum(IN_BYTES) as IN_BYTES sum(OUT_BYTES) as OUT_BYTES by _time,PROTOCOL | eval 5m_avg=(IN_BYTES+OUT_BYTES)/300/1024/1024 | timechart span=5min max(5m_avg) as 5m_avg by PROTOCOL
Regarding flow data,
22: FIRST_SWITCHED
11: L4_DST_PORT
12: IPV4_DST_ADDR
23: OUT_BYTES
24: OUT_PKTS
14: OUTPUT_SNMP (interface number)
57590: L7_PROTO
1: IN_BYTES
2: IN_PKTS
4: PROTOCOL
5: SRC_TOS
6: TCP_FLAGS
7: L4_SRC_PORT
8: IPV4_SRC_ADDR
57659: HTTP_HOST
10: INPUT_SNMP (interface number)
21: LAST_SWITCHEDMon, 03 Dec 2018 02:28:10 GMTmasato_sekiguchi_aajComment by rich7177
https://answers.splunk.com/comments/703317/view.html
Maybe I'm missing something, but I don't see any glaring issues with your methodology (see more below) so what makes you think it's wrong? And is it wrong from the view of some other product's numbers, or is it wrong from the view of "given the data I have, Splunk is literally not adding things up right?"
Simplifying a bit, you have a timespan and a number of bytes. Doing the math you did should get you a bits per (time unit of time difference). I can't confirm which number in your JSON maps to which field, but my guess would be the 1543561246000 is one of the LAST_SWITCHED or FIRST_SWITCHED, and there's another timstamp like that later. Those are not seconds, they also include the ms. So that means the math you have shows bits per millisecond, not bits per second. Could that be it?Sat, 01 Dec 2018 18:02:56 GMTrich7177