Splunk Search

With 2 sources producing similar data, how to dedup events within 2 seconds of each other, but only keep events from one particular source?

gesman
Communicator

I have two sources of traffic logs my_source1 and my_source2 that record approximately the same data with few important differences.
I need to dedup data in this way:
source=my_source* | dedup _time, ip, page

But with the following important difference:
If events are found to occur within 2 seconds of each other (same ip, page) - consider them duplicates, but only keep events from my_source2, even if they occurred earlier.
What's the most efficient way to accomplish that?

Note: system generates up to 100,000 events per hour.

Tags (1)
0 Karma

inode
Explorer

I would suggest you using transaction command if the data volume is not so high. The biggest advantage is that it enables you to aggregate similar events from the distinct sources in one transaction while providing a "duration" field based on the _time used between the similar events.

By using eval's mvindex() you are then able to keep only the last or first events from the transaction.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...