Solved: Compare field from 2 sources and return when sourc...

axinjakson · ‎11-07-2011

I am attempting to take IPs from 2 different sources and output a list for when Source1 has a unique IP that is not present in Source2. Source2 can contain as many unique IPs as it wants, the main goal is just to extract unique IPs from Source1.

Source1

192.168.0.1
192.168.0.2
192.168.0.3*

Source2

192.168.0.1
192.168.0.2
192.168.0.4
192.168.0.5

Output

192.168.0.3*

Using online docs and previous posts I have attempted with the query below in several variations. It works well to give me all matching, but if I attempt <, without stats, where 1<2, or any other variation I can think of, it fails. Help?

lguinn2 · ‎11-07-2011

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

View solution in original post

kristian_kolb · ‎11-08-2011

Hi, have you tried a simple subsearch? This should be more efficient than a join.

sourcetype=SOURCE_1 NOT [search sourcetype=SOURCE_2 | dedup src_ip | fields + src_ip] | dedup src_ip | stats values(src_ip)

This subsearch 'expands' to src_ip=xxx OR src_ip=yyyy OR src_ip=zzzzz, and combining this with NOT should get you the results you desire. For more information, please see the docs.

http://docs.splunk.com/Documentation/Splunk/4.2.2/User/HowSubsearchesWork#Use_subsearch_to_correlate...

Hope this helps,

Kristian

cabauah · ‎10-30-2017

subsearch has a limit. can we use simple stats to find what's missing in two sourcetypes based on a common field

lguinn2 · ‎11-07-2011

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

regriffith · ‎02-24-2017

The statement | where source1 = "Y" AND source2 != "Y" doesn't work. You can use | where source1 = "Y" AND NOT source2= "Y".

lguinn2 · ‎11-09-2011

Instead of running "Y" on both, you could just eliminate the type=outer. An outer join says "create a joined result even when one of the two events doesn't exist." A normal join, AKA an inner join, says "only create a joined if both events exist" -- this should only give you the 50 that are present in both, and it is more efficient. Otherwise, you are creating the full result set and then eliminating some of them...

And DOH! I often forget that the syntax for the where command and the syntax for search are NOT the same! Good catch.

axinjakson · ‎11-09-2011

Works 99% Thanks! The output tables will add nicely to my reporting.

Have 1 issue though... The initial output gives 85 results total, 50 are present in both, 35 are not present in Source2. I can run "where = Y" on both and get the correct return of 50 total. However when I try the != I get no results at all?? Same as your last where string of Source1=Y and Source2!=Y, formatting is not correct in my comment of course.

EDIT: | where source1 = "Y" AND NOT source2 = "Y" works 🙂

Compare field from 2 sources and return when source1 has no match in source2

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes