Hello, I am trying to do multiple aggregations on data each time grouped by different fields.
I have the following data:
from_timestamp from to to_timestamp Amount
1446741264 100 104 1446741274 100
1446741265 101 100 1446741270 200
1446741270 100 101 1446741271 150
1446741275 102 103 1446741277 200
1446741277 100 101 1446741278 120
1446741277 101 100 1446741264 200
The data represents the transactions between two accounts, where the transaction is created by an account ("from") and accepted by another account ("to") with an amount, and the two timestamps are the creation timestamp and the acceptance timestamp.
Now I want to calculate the following result:
from to min(delta from_timestamp) min(delta to_timestamp) sum(amount)
100 104 6 0 100
100 101 6 7 270
101 100 12 6 400
102 103 0 0 200
Which is the combination of the following searches into one table:
stats sum(amount) by from, to
- the sum(amount) aggregated based on unique pairs of "from" and "to"
stats min(delta of consecutive from_timestamp by from)
- minimum of delta of from_timestamps when grouped by from
stats min(delta of consecutive to_timestamp by to)
- minimum of delta of to_timestamps when grouped by to
Column min(delta from_timestamp) is generated by grouping the data by "from" (ignore "to"), calculate the delta of timestamps for each group, take the minimum. For example, the value 6 in first and second row is calculate by taking min(1446741270-1446741264, 1446741277-1446741270) = 6 (these are from_timestamps from rows 1,3,5)
So two questions:
How to group data based on one field ("from" or "to"), take the delta of timestamps and then get min of the result? (I tried transactions but it returned no results)
How to combine the three searches into one table as shown above?
or maybe there is a way to do all three queries at once?
Thanks
Update:
I am almost figuring it out by using a combination of streamstats, eventstats, stats, eval, fillnull.
sort from_timestamp | streamstats current=f window=1 global=f last(from_timestamp) as last_from_ts by from | sort to_timestamp | streamstats current=f window=1 global=f last(to_timestamp) as last_to_ts by to | fillnull value=0 last_from_ts, last_to_ts | eval from_ts_diff=seller_timestamp-last_seller_ts | eval to_ts_diff=buyer_timestamp-last_buyer_ts | eventstats min(from_ts_diff) as min_from_diff by from | eventstats min(to_ts_diff) as min_to_diff by to | stats sum(Amount), min(min_from_diff), min(min_to_diff) by from,to
I am not sure if this gives the best performance though. Any other suggestions?
Have you looked at the delta command? http://docs.splunk.com/Documentation/Splunk/6.2.0/SearchReference/Delta
This what your search could look like (this is untested code)
... | sort from_timestamp | delta from_timestamp as from_ts_diff | sort to_timestamp | delta to_timestamp as to_ts_diff | stats sum(Amount), min(from_ts_diff), min(to_ts_diff) by from,to
Yes, the problem with delta is that it does not do the grouping.
I need to calculate the delta for a each subset of data with same from or to field.
This is why I used streamstats and then eval to do the grouping first, and then calculate the delta.