Splunk Search

Dedup within a MV field

pkashou
Explorer

I need the ability to dedup a multi-value field on a per event basis. Something like values() but limited to one event at a time. The ordering within the mv doesn't matter to me, just that there aren't duplicates. Any help is greatly appreciated.

My search:

host=test* | transaction Customer maxspan=3m | eval logSplit = split(_raw,",") | eval eventSplit = mvfilter(match(logSplit, "^[E|e]vent-")) | table eventSplit

Normal output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Preferred output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Tags (1)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

View solution in original post

emiller42
Motivator

I know this is an old question, but I stumbled upon this while trying to do the same thing, and there is now a much cleaner solution:

eval mvfield=mvdedup(mvfield)

danbutterman
Explorer

Exactly what I was looking for.

Love this community.

redc
Builder

I ran into this need today and stumbled across this post...

It's worth noting for anyone else who finds this post while trying to figure out how to do this that <code>mvdedup</code> was only introduced in 6.2.0.

0 Karma

sideview
SplunkTrust
SplunkTrust

Another idea is to use stats values(), but do a weird trick to make it calculate unique values only within each row.

| streamstats count as row_number | stats values(mvField) as mvField by row_number | fields - row_number

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

pkashou
Explorer

Thanks to both of you as these both worked to a certain degree. The stats weird trick did some strangeness to the output so I ended up using the mvexpand/mvcombine approach along with eventstats.

Much appreciated!

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...