Splunk Search

Search over two sources and present combined results

frank_zhang
Path Finder

Hi,

I have the following two sources:

Source1:

| Time  | IP       | MAC    |
| 08:01 | 10.0.1.1 | MAC1   |
| 08:02 | 10.0.1.2 | MAC2   |
......
| 08:31 | 10.0.1.1 | MAC1-1 |
......
| 09:01 | 10.0.1.1 | MAC1-2 |
| 09:02 | 10.0.1.2 | MAC2   |
| 09:03 | 10.0.1.3 | MAC3   |

Raw events for this source are generated constantly, and same IP might be associated with different MAC address over time.

Source2:
| Time | IP | Site | Used |
| 08:00 | 10.0.1.1 | Site1 | Used |
| 08:00 | 10.0.1.2 | Site2 | Unused |
......
| 08:29 | 10.0.1.1 | Site1 | Unused |
| 08:30 | 10.0.1.1 | Site1-1 | Unused |
......
| 09:00 | 10.0.1.1 | Site1 | Used |
| 09:00 | 10.0.1.2 | Site2 | Used |
......
| 09:10 | 10.0.1.3 | Site3 | Used |

Raw events for this source are also generated constantly, and independant from Source1.

The report I'm trying to build is to search over both sources and present combined results as below:

| Time  | IP       | MAC    | Site    | Used   |
| 08:01 | 10.0.1.1 | MAC1   | Site1   | Used   |
| 08:02 | 10.0.1.2 | MAC2   | Site2   | Unused |
......
| 08:31 | 10.0.1.1 | MAC1-1 | Site1-1 | Unused |
......
| 09:01 | 10.0.1.1 | MAC1-2 | Site1   | Used   |
| 09:02 | 10.0.1.2 | MAC2   | Site2   | Used   |
| 09:03 | 10.0.1.3 | MAC3   | NULL    | NULL   |

I.e., events from both sources are joined by the IP field, and the "Site" and "Used" values are based on the latest event in Source2 at that time for that specific IP.

Could anyone please shed some light on how to build such search? I assume I should use the transaction command but haven't figure out how to use it correctly.

Thanks.

Tags (3)
0 Karma

MuS
SplunkTrust
SplunkTrust

Hi frank_zhang,

give this a try:

base search yielding events from both sources | stats last(Site) AS Site, last(Used) AS Used by IP, MAC, Time

that should provide what you described in the last comment.

hope this helps, also take a look at this answer

cheers, MuS

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Hmm... SplunkLive Hamburg im Mai?

0 Karma

MuS
SplunkTrust
SplunkTrust

Kieler Woche wird nichts, eventuell bin ich im Herbst in der Nähe von Flensburg. Würde mich dann melden 😉

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Zählt sort|streamstats? Falls ja, komm' zur Kieler Woche und ich lass mir gerne einen ausgeben... http://www.consist.de/comply&secure2014

0 Karma

MuS
SplunkTrust
SplunkTrust

arrrgghhhllll ...... Ich spendier 'ne Runde Flens für die Lösung ohne transaction | join | append 😉 😄

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Yeah... see above, grouping by MAC that easily won't work either 😞

0 Karma

MuS
SplunkTrust
SplunkTrust

That's true, this was an early morning answer and I did not consider all possibilities. In this case, it would be better to use something like this:

base search yielding events from both sources | stats last(Time) AS Time, last(Site) AS Site, last(Used) AS Used by IP, MAC

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I don't think grouping by Time is appropriate - imagine the most recent source2 event for an IP/MAC combo happened in the previous unit of time, then there is no last(Used) for that combination of group by-fields.

martin_mueller
SplunkTrust
SplunkTrust

Give this a shot:

base search yielding events from both sources | transaction IP maxspan=5m | table _time IP MAC Site Used

I've assumed that an IP will not be reused within five minutes, change the span if a shorter reuse cycle is possible. Make sure the field names are exactly as they exist in your sources.

Using transaction allows for small time differences as shown in your first example, 08:01 in source1 and 08:00 in source2 - joining by IP and Time would fail here.
Joining by IP only would fail because of the IP reuse.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

If you have a reasonably small set of data you can sort it by time in ascending order, and copy over the latest seen value of Used and Seen to the source1 events using streamstats:

base blah blah | sort + _time | streamstats latest(Used) as Used latest(Site) as Site by IP | search source=Source1 | table Time IP MAC Site Used

That may not be feasible for large data sets though.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

...now that I look at it again, the two stats by IP MAC queries won't work at all because source2 doesn't have the MAC.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

To get the latest Site/Used value for every IP/MAC combo you can almost use @MuS' query, with only small changes:

base search yielding events from both sources | stats latest(Time) as latestTime latest(Site) AS latestSite, latest(Used) AS latestUsed by IP, MAC

That'll give you a table with those five columns, and one row for every IP/MAC combo in the time range. The source1 events at 08:02 and 09:02 from your first example would be bunched together into one event at 09:02 using the latest values for Used and Site.

0 Karma

frank_zhang
Path Finder

Basically what I wanted to show with this report is "IP and MAC address with the latest known Site and Used status."

I'm thinking that I may specify "maxspan=-1" but that could result in an IP address joined with all previous Site/Used rather than the "latest". Probably in the subsearch I should do " | stats latest(*) by IP".

Do you think it's a viable solution? Is there any performance concern for using "maxspan=-1"?

Thanks again and I'm going to try your suggestions.

0 Karma

frank_zhang
Path Finder

Regarding maxspan=5m, chances are I may end up events like below:
Source1:
| Time | IP | MAC |
| 08:01 | 10.0.1.1 | MAC1 |
| 08:05 | 10.0.1.1 | MAC1-1 |
Source2:
| Time | IP | Site | Used |
| 01:00 | 10.0.1.1 | Site1 | Used | <-- 1st occurence for IP 10.0.1.1

I.e., the latest event in Source2 for an IP was generated far before the event in Source1, and we still need to have them joined together as below:

| Time  | IP       | MAC    | Site  | Used |
| 08:01 | 10.0.1.1 | MAC1   | Site1 | Used |
| 08:05 | 10.0.1.1 | MAC1-1 | Site1 | Used |
0 Karma

frank_zhang
Path Finder

However for the purposes of this report, we don't care if the lease is assigned to a different MAC address, so it's okay to display two final events with same IP/Site/Used but different MAC. Given this I guess IP recycle is not a concern. Please let me know if it may cause other issues that I'm not aware of.

0 Karma

frank_zhang
Path Finder

Thanks for the post. My comments are too long so I'm going to split it:

I didn't realize the time span could be a factor, so the sample events I came up was over simplified.

An IP is likely to be reused within 5 minutes, Source1 is actually DHCP lease events, so there might be a request event immediately followed by an issue event.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think it's not enough to join the two sources by IP since the same IP appears at several different times. A simple join should work, however. You can use a transaction if you have distinct events that start and end each transaction.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...