I want to create a query that is like a nested for loop.
IP Addresses
10.10.10.10
11.11.11.11
12.12.12.12
13.13.13.13
I want to start at the first IP Addresses then compare it to the other IP Addresses. Then create new fields <e.g. comparing 1 to 2>. After the first iteration, I want to start comparing the 2nd IP Address with the other IP Address and create new fields along the way. Is this possible in splunk or am should I not even attempt this.
Depending on what you're wanting to do afterwards and what other fields you have... it sounds like you want a crossproduct, and then do all of the comparisons in parallel. One way we could do this, we can use eventstats to collect the values of all of the ips and add it as a new field on each event, mvexpand to expand our results to each pair, and where to remove the comparisons to self.
So assuming you're starting with a result set of ip addresses (field name ip
for brevity)
... | eventstats values(ip) as other | mvexpand other | where ip!=other | ... calculation using ip and other...
Be careful now, if you have N values you'll have N*N intermediate results so if you have a really really long list, it could take a lot of resources, but hopefully gives a starting idea
EDIT TO ADD: Actually I thought about it some more, you're wanting the first to the remaining 3, the second to the remaining 2 the 3rd to the last... (or we could do it backwards, 2nd to first, 3rd to first & second, 4th to first through third and so on). In which case we could save a bunch of comparisons using streamstats which is like stats and eventstats, but only considering all events before the current point and similar logic:
... | streamstats current=f values(ip) as other | mvexpand other | ... calculation using ip and other...
And we have a much smaller set of intermediate values (and fewer duplicates if order doesn't matter)
I have a similar question, but I don't quite follow the streamstats logic. My query returns results with multiple fields (sorry if my terminology if off -- I'm new to splunk), something like the following:
1. src = 1.2.3.4, dest = 2.3.4.5, name = bob
2. src = 2.4.6.8, dest = 3.5.7.9, name = alice
3. src = 2.3.4.5, dest = 1.2.3.4, name = jack
...
I want to do a nested loop search, comparing all pairs, and return entries where src = dest and dest = src for two pairs (e.g. records 1 and 3 in this example). I know how to do this if I were to download a csv file and write e python script to do the nested loop, but it would be great to be able to do this within Splunk. But maybe that's beyond what Splunk can do...
It is indeed a similar question, would you me to break it out as a separate but linked question? streamstats
does the same functions as stats
, but the difference is that stats
functions on the entire result set, whereas streamstats
steps through the result set and functions on a subset of the results it has already seen. Default is all previous results including the current result (i.e. the result on which it will write the values of its functions to fields), but options like current=f
lets me adjust what subset of previous results it operates on. (in this case particular case exclude the current result from calculations). Now, a first pass at your problem could be:
... | eval srcdest = if(src < dest, src."x".dest, dest."x".src) | streamstats current=f last(*) as last_* by srcdest | where src=last_dest
We create a field that identifies the pair of src and destination, use streamstats to get the values of the last of that pair, then keep only those where src and dest have traded places. The edge case that reveals itself however is what to do when you have multiple of the same src/dest pair. This solution makes certain assumptions and different solutions would have different ones.
Thank you very much for the reply! However, I tried this and it doesn't appear to be working. The last piece "where src=last_dest" doesn't seem to work properly from what I can tell.... or maybe the statement before. The edge case you described is also of concern, as I expect multiple pairs of results for the different entries. Also it would be nice to compare other things in addition to src and dest, such as time or other fields (if I can ever get this piece working).
Is there something besides streamstats that might be useful here? This seems like such a simple problem (easy to write in any program language if I were just read in in a file) that there should be some way of doing this in splunk.
To think of this another way, I'd like to create strings/tuples/whatever based on combining a few fields from entries in my search results and then iterate through all of those and return those records where the tuples I have formed match.
The IP Addresses in that example are in a multi-value field. And we have more than one row.
That too is not necessarily an insurmountable problem, you could use streamstats to make a row identifier, then expand the ip addresses, and then use the row identifier to limit your scope with the streamstats to limit the looping to the context of each original row (or not depending on what you're trying to do).
... | streamstats count as row | mvexpand ip | streamstats current=f values(ip) as other by row | mvexpand other | ...
Of course if there are other additional constraints that might change things again. You'd have to also combine results back together, but stats something() by row
might be handy here, again depending on how you need to aggregate things back together by row.
I am starting to think this problem is impossible. Thank you anyway