Splunk Search

How to compare events from two sources to find outliers in data?

bcusick
Communicator

Hi,

I'm trying to compare events from two sources to show where the outliers are (they "should" be the same but we know that there are discrepancies.

I can compare "number of rows/events" easily with a "chart count by source" command, but I also want to check the integrity of the field values.

Basically, each event has 10 fields (same ten fields in each source). How do I check that they are the same, and return some kind of message/raw event/field value if they are not the same?

Thanks.

0 Karma

somesoni2
Revered Legend

May be something like this

source=source1 | eval source1Data=Field1."##".Field2."##"...<<all 10 fields concatenated>>."##".Field10 | appendcols [search source=source2 | eval source2Data=Field1."##".Field2."##"...<<all 10 fields concatenated>>."##".Field10]
| eval result=if(source1Data=source2Data,"Matched","Unmatched")
0 Karma

bcusick
Communicator

Would it be possible to do this with raw data? Meaning using the field "_raw"? This runs, but results are incorrect due to the sources being different..Here's that example:

source="D:\Bluesheets\ExtractFromFES.csv" eval FESdata=_raw | appendcols [search source="D:\Bluesheets\SentToReg.csv" | eval Regdata=_raw] | eval result=if(FESdata=Regdata,"Matched","Unmatched") | table result

0 Karma

SierraX
Communicator

Ok it's maybe a bit late … but for future searchers the other answers are to complicated, no help or wrong
```
index=main sourcetype="test2" | stats values(source) as sources by _raw | eval sources=if(mvcount(sources)>1,"match","no match")
```
Beware of linecounts>1 in the main search, this could create false "no match"

0 Karma

somesoni2
Revered Legend

Can you post your query? and may be some sample data?

0 Karma

bcusick
Communicator

This keeps telling me I have mismatched "]" but I checked multiple times to ensure it's correct. Could the fact that my fields contain "." and "-" have anything to do with this?

0 Karma

somesoni2
Revered Legend

Since I used appendcols, it will compare source1 event1 with source2 event1. It would fail for the cases no of rows differ in the sources.

0 Karma

bcusick
Communicator

This looks like it will work. Will provide an update tomorrow. Will this know to compare source1 event1 with source2 event1?

0 Karma

bcusick
Communicator

This is for reporting to regulators, so everything should be EXACTLY the same. Same timestamp, same field order, etc. I want to be able to check if any fieldname is different (I can pivot on field TRANSACTION_ID) for everything else. All field names static, and yes, If row numbers are the same (which they should be) I should be able to compare row1.source1 to row1.source2

0 Karma

somesoni2
Revered Legend

Does both sources have timestamp and do they differ? What should be the order of rows/events for field comparison;first row of source1 with first row of source2?? Are field names static?

0 Karma
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...