Hi,
I have the following two sources:
Source1:
| Time | IP | MAC |
| 08:01 | 10.0.1.1 | MAC1 |
| 08:02 | 10.0.1.2 | MAC2 |
......
| 08:31 | 10.0.1.1 | MAC1-1 |
......
| 09:01 | 10.0.1.1 | MAC1-2 |
| 09:02 | 10.0.1.2 | MAC2 |
| 09:03 | 10.0.1.3 | MAC3 |
Raw events for this source are generated constantly, and same IP might be associated with different MAC address over time.
Source2:
| Time | IP | Site | Used |
| 08:00 | 10.0.1.1 | Site1 | Used |
| 08:00 | 10.0.1.2 | Site2 | Unused |
......
| 08:29 | 10.0.1.1 | Site1 | Unused |
| 08:30 | 10.0.1.1 | Site1-1 | Unused |
......
| 09:00 | 10.0.1.1 | Site1 | Used |
| 09:00 | 10.0.1.2 | Site2 | Used |
......
| 09:10 | 10.0.1.3 | Site3 | Used |
Raw events for this source are also generated constantly, and independant from Source1.
The report I'm trying to build is to search over both sources and present combined results as below:
| Time | IP | MAC | Site | Used |
| 08:01 | 10.0.1.1 | MAC1 | Site1 | Used |
| 08:02 | 10.0.1.2 | MAC2 | Site2 | Unused |
......
| 08:31 | 10.0.1.1 | MAC1-1 | Site1-1 | Unused |
......
| 09:01 | 10.0.1.1 | MAC1-2 | Site1 | Used |
| 09:02 | 10.0.1.2 | MAC2 | Site2 | Used |
| 09:03 | 10.0.1.3 | MAC3 | NULL | NULL |
I.e., events from both sources are joined by the IP field, and the "Site" and "Used" values are based on the latest event in Source2 at that time for that specific IP.
Could anyone please shed some light on how to build such search? I assume I should use the transaction command but haven't figure out how to use it correctly.
Thanks.
Hi frank_zhang,
give this a try:
base search yielding events from both sources | stats last(Site) AS Site, last(Used) AS Used by IP, MAC, Time
that should provide what you described in the last comment.
hope this helps, also take a look at this answer
cheers, MuS
Hmm... SplunkLive Hamburg im Mai?
Kieler Woche wird nichts, eventuell bin ich im Herbst in der Nähe von Flensburg. Würde mich dann melden 😉
Zählt sort|streamstats
? Falls ja, komm' zur Kieler Woche und ich lass mir gerne einen ausgeben... http://www.consist.de/comply&secure2014
arrrgghhhllll ...... Ich spendier 'ne Runde Flens für die Lösung ohne transaction
| join
| append
😉 😄
Yeah... see above, grouping by MAC
that easily won't work either 😞
That's true, this was an early
morning answer and I did not consider all possibilities. In this case, it would be better to use something like this:
base search yielding events from both sources | stats last(Time) AS Time, last(Site) AS Site, last(Used) AS Used by IP, MAC
I don't think grouping by Time
is appropriate - imagine the most recent source2 event for an IP
/MAC
combo happened in the previous unit of time, then there is no last(Used)
for that combination of group by
-fields.
Give this a shot:
base search yielding events from both sources | transaction IP maxspan=5m | table _time IP MAC Site Used
I've assumed that an IP will not be reused within five minutes, change the span if a shorter reuse cycle is possible. Make sure the field names are exactly as they exist in your sources.
Using transaction allows for small time differences as shown in your first example, 08:01 in source1 and 08:00 in source2 - joining by IP and Time would fail here.
Joining by IP only would fail because of the IP reuse.
If you have a reasonably small set of data you can sort it by time in ascending order, and copy over the latest seen value of Used
and Seen
to the source1 events using streamstats
:
base blah blah | sort + _time | streamstats latest(Used) as Used latest(Site) as Site by IP | search source=Source1 | table Time IP MAC Site Used
That may not be feasible for large data sets though.
...now that I look at it again, the two stats by IP MAC
queries won't work at all because source2 doesn't have the MAC
.
To get the latest Site
/Used
value for every IP
/MAC
combo you can almost use @MuS' query, with only small changes:
base search yielding events from both sources | stats latest(Time) as latestTime latest(Site) AS latestSite, latest(Used) AS latestUsed by IP, MAC
That'll give you a table with those five columns, and one row for every IP
/MAC
combo in the time range. The source1 events at 08:02 and 09:02 from your first example would be bunched together into one event at 09:02 using the latest values for Used
and Site
.
Basically what I wanted to show with this report is "IP and MAC address with the latest known Site and Used status."
I'm thinking that I may specify "maxspan=-1" but that could result in an IP address joined with all previous Site/Used rather than the "latest". Probably in the subsearch I should do " | stats latest(*) by IP".
Do you think it's a viable solution? Is there any performance concern for using "maxspan=-1"?
Thanks again and I'm going to try your suggestions.
Regarding maxspan=5m, chances are I may end up events like below:
Source1:
| Time | IP | MAC |
| 08:01 | 10.0.1.1 | MAC1 |
| 08:05 | 10.0.1.1 | MAC1-1 |
Source2:
| Time | IP | Site | Used |
| 01:00 | 10.0.1.1 | Site1 | Used | <-- 1st occurence for IP 10.0.1.1
I.e., the latest event in Source2 for an IP was generated far before the event in Source1, and we still need to have them joined together as below:
| Time | IP | MAC | Site | Used |
| 08:01 | 10.0.1.1 | MAC1 | Site1 | Used |
| 08:05 | 10.0.1.1 | MAC1-1 | Site1 | Used |
However for the purposes of this report, we don't care if the lease is assigned to a different MAC address, so it's okay to display two final events with same IP/Site/Used but different MAC. Given this I guess IP recycle is not a concern. Please let me know if it may cause other issues that I'm not aware of.
Thanks for the post. My comments are too long so I'm going to split it:
I didn't realize the time span could be a factor, so the sample events I came up was over simplified.
An IP is likely to be reused within 5 minutes, Source1 is actually DHCP lease events, so there might be a request event immediately followed by an issue event.
I think it's not enough to join the two sources by IP since the same IP appears at several different times. A simple join
should work, however. You can use a transaction if you have distinct events that start and end each transaction.