Splunk Search

Dedup within a MV field

pkashou
Explorer

I need the ability to dedup a multi-value field on a per event basis. Something like values() but limited to one event at a time. The ordering within the mv doesn't matter to me, just that there aren't duplicates. Any help is greatly appreciated.

My search:

host=test* | transaction Customer maxspan=3m | eval logSplit = split(_raw,",") | eval eventSplit = mvfilter(match(logSplit, "^[E|e]vent-")) | table eventSplit

Normal output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Preferred output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Tags (1)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

View solution in original post

emiller42
Motivator

I know this is an old question, but I stumbled upon this while trying to do the same thing, and there is now a much cleaner solution:

eval mvfield=mvdedup(mvfield)

danbutterman
Explorer

Exactly what I was looking for.

Love this community.

redc
Builder

I ran into this need today and stumbled across this post...

It's worth noting for anyone else who finds this post while trying to figure out how to do this that <code>mvdedup</code> was only introduced in 6.2.0.

0 Karma

sideview
SplunkTrust
SplunkTrust

Another idea is to use stats values(), but do a weird trick to make it calculate unique values only within each row.

| streamstats count as row_number | stats values(mvField) as mvField by row_number | fields - row_number

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

pkashou
Explorer

Thanks to both of you as these both worked to a certain degree. The stats weird trick did some strangeness to the output so I ended up using the mvexpand/mvcombine approach along with eventstats.

Much appreciated!

Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...