Splunk Search

How to dedup after | stats list(blah)?

packet_hunter
Contributor

Scenario: I am extracting sender domains with the following code:

index=mail sourcetype=xemail

    [search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 

        |stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 

            |search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats list(domain) 

Problem: I am not able to find the correct dedup syntax so I only list one unique domain once.
Currently I see
domain_A.tld
domain_A.tld
domain_A.tld
domain_B.tld
domain_C.tld
domain_A.tld

When I want just
domain_A.tld
domain_B.tld
domain_C.tld

Please provide an example.

Thank you

Tags (3)
0 Karma
1 Solution

MuS
SplunkTrust
SplunkTrust

Hi packet_hunter,

just use values() instead of list() in the last stats:

index=mail sourcetype=xemail
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 
|search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats values(domain) 

This will return uniq values for domain http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/CommonStatsFunctions .

values(X)   Returns the list of all distinct values of the field X as a multivalue entry. The order of the values is lexicographical.

Also, you may want to look at this answer https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... and try to use stats in your search, because subsearch have limits and are expensive and slow to run over a large data set.

Hop this helps ...

cheers, MuS

View solution in original post

MuS
SplunkTrust
SplunkTrust

Hi packet_hunter,

just use values() instead of list() in the last stats:

index=mail sourcetype=xemail
[search index=mail sourcetype=xemail subject = "Blah" |stats count by UID| fields UID] 
|stats list(subject) as subj list(sender) as sender list(recipient) as recp list(vendor_action) as status by UID 
|search status= "Message done" |rex field=sender "[@\.](?<domain>\w+\.\w+)$" | stats values(domain) 

This will return uniq values for domain http://docs.splunk.com/Documentation/Splunk/6.3.3/SearchReference/CommonStatsFunctions .

values(X)   Returns the list of all distinct values of the field X as a multivalue entry. The order of the values is lexicographical.

Also, you may want to look at this answer https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... and try to use stats in your search, because subsearch have limits and are expensive and slow to run over a large data set.

Hop this helps ...

cheers, MuS

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...