Splunk Search

Compare field from 2 sources and return when source1 has no match in source2

axinjakson
Explorer

I am attempting to take IPs from 2 different sources and output a list for when Source1 has a unique IP that is not present in Source2. Source2 can contain as many unique IPs as it wants, the main goal is just to extract unique IPs from Source1.

Source1

  • 192.168.0.1
  • 192.168.0.2
  • 192.168.0.3*

Source2

  • 192.168.0.1
  • 192.168.0.2
  • 192.168.0.4
  • 192.168.0.5

Output

  • 192.168.0.3*

Using online docs and previous posts I have attempted with the query below in several variations. It works well to give me all matching, but if I attempt <, without stats, where 1<2, or any other variation I can think of, it fails. Help?

sourcetype="SOURCE1" | dedup src_ip | fields src_ip | eval ip=src_ip | eval type=1 | append [ search sourcetype="SOURCE2" | dedup src_addr | fields src_addr | eval ip=src_addr | eval type=2 ] | stats dc(type) as dctype by ip | where dctype>1 | table ip

1 Solution

lguinn2
Legend

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

View solution in original post

kristian_kolb
Ultra Champion

Hi, have you tried a simple subsearch? This should be more efficient than a join.

sourcetype=SOURCE_1 NOT [search sourcetype=SOURCE_2 | dedup src_ip | fields + src_ip] | dedup src_ip | stats values(src_ip)

This subsearch 'expands' to src_ip=xxx OR src_ip=yyyy OR src_ip=zzzzz, and combining this with NOT should get you the results you desire. For more information, please see the docs.

http://docs.splunk.com/Documentation/Splunk/4.2.2/User/HowSubsearchesWork#Use_subsearch_to_correlate...

Hope this helps,

Kristian

cabauah
Path Finder

subsearch has a limit. can we use simple stats to find what's missing in two sourcetypes based on a common field

0 Karma

lguinn2
Legend

I am not sure if this will work, but I tested as best I could with the data that I had at hand:

sourcetype=SOURCE1 earliest=-1h | dedup src_ip |  eval ip=src_ip | eval source1="Y" | table ip source1 | 
join ip type=outer [search sourcetype=SOURCE2 earliest=-1h | dedup src_addr | eval ip=src_addr |
eval source2 = "Y" | table ip source2 ]

After you run this command, you should have 3 columns of output:

ip source1 source2

The ip column will be the ip address, of course. The source1 column will contain Y if the ip address existed in source1, and the source2 column will contain Y if the ip address existed in source2. To go further, you could append the following to the search to get the variations that you want

To only show IPs that appear in both, add:

| where source1 = "Y" AND source2 = "Y"

To only show IPs that appear ONLY in source1, add:

| where source1 = "Y" AND source2 != "Y"

You get the idea. There might be a more efficient way to do this, but this gives you a "base search" that is pretty flexible. You could even put the "base search" in a macro, which would make it even easier to type.

Note the use of earliest=-1h in the two searches; this will keep your working data size under reasonable control, and help keep the search reasonably quick.

regriffith
Path Finder

The statement | where source1 = "Y" AND source2 != "Y" doesn't work. You can use | where source1 = "Y" AND NOT source2= "Y".

0 Karma

lguinn2
Legend

Instead of running "Y" on both, you could just eliminate the type=outer. An outer join says "create a joined result even when one of the two events doesn't exist." A normal join, AKA an inner join, says "only create a joined if both events exist" -- this should only give you the 50 that are present in both, and it is more efficient. Otherwise, you are creating the full result set and then eliminating some of them...

And DOH! I often forget that the syntax for the where command and the syntax for search are NOT the same! Good catch.

0 Karma

axinjakson
Explorer

Works 99% Thanks! The output tables will add nicely to my reporting.

Have 1 issue though... The initial output gives 85 results total, 50 are present in both, 35 are not present in Source2. I can run "where = Y" on both and get the correct return of 50 total. However when I try the != I get no results at all?? Same as your last where string of Source1=Y and Source2!=Y, formatting is not correct in my comment of course.

EDIT: | where source1 = "Y" AND NOT source2 = "Y" works 🙂

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...