Splunk Search

I want to delete duplicates from the splunk index which have same _raw and same _time

ashutoshsharma1
Path Finder

Tried using the already answered question on splunk answer on the same topic they say do it using lookup or sub search. like this

Error on using delete on stream stats:-

index=idx1 sourcetype=csv|streamstats count by _raw _time |where cnt>1|delete

Both the below methods are giving me wrong output i have many duplicates but below commands only showing me a few.
Subsearch method:-

index=idx1 sourcetype=csv| eval eid=_cd|search[ |streamstats count as cnt by _raw _time | where cnt>1| field eid ]|delete

Lookup method:-

index=idx1 sourcetype=csv|streamstats count by _raw _time |where cnt>1| eid=_cd | stats count by eid| fields eid| outputcsv del_id.csv

index=id1 sourcetype=csv|eval eid=_cd | search [ del_id.csv ] | delete

Tags (1)
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

This will give you the data you need:

index=foo sourcetype=bar | stats count by _time _raw | where count > 1

For large amounts of data this might take a while.

To actually delete, you cannot use nonstreaming commands - you'd need to write the _bkt and _cd fields into a lookup, and then search like this:

index=foo sourcetype=bar [inputlookup that_lookup] | delete

Make sure to test this on unimportant data first!

martin_mueller
SplunkTrust
SplunkTrust

Streamstats is centralized streaming command that gets executed on the search head... the delete has to run on the indexers though, they're the ones storing and therefore deleting data. As a result, you can only run distributed streaming commands before a delete.
See http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchReference/Commandsbytype for reference.

As for your second comment, the stats only produces three fields - _time _raw count. There's no _bkt and _cd field present. One approach to copy them over might work something like this (untested):

index=foo sourcetype=bar | stats count list(_cd) as _cd list(_bkt) as _bkt by _time _raw | where count > 1 | eval _cd = mvindex(_cd, 1, -1) | eval _bkt = mvindex(_bkt, 1, -1) | eval zip = mvzip(_cd, _bkt, "##breaker##") | mvexpand zip | eval zip = split(zip, "##breaker##") | eval _cd = mvindex(zip, 0) | eval _bkt = mvindex(zip, 1) | fields - zip count | outputlookup ...

The list() tell the stats to keep a list of the two fields identifying an event, and the first two mvindex() should keep all but the first element in the list. The whole mvzip|mvexpand|split|mvindex part should put every event into one row.

Again, important reminder: Don't run this without first extensively testing on unimportant data!

0 Karma

ashutoshsharma1
Path Finder

We are trying to put the event Ids of duplicate records in a csv file. However when we run the below command it gives no results.

index=idx1 sourcetype=csv|stats count by _raw _time |where cnt>1|eval eid=_cd*|stats count by eid|fields - count|outputlookup filename.csv*

But when we run it without the part in bold, it gives the correct count.
So we are not able to figure out what is wrong in the later part of the query highlighted in bold.

0 Karma

ashutoshsharma1
Path Finder

@martin_mueller
Sir I will try your answer .
Can you please help me understand why i am not getting the correct answer using the streamstats and count>1 .As i don't have much understanding how streamstats works can be a good learning.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I'd be careful about that. What is delete supposed to do after a reporting command such as stats? Delete both inputs to the stats? Delete only one? Which one?

0 Karma

inventsekar
Ultra Champion

ohk, as the question was - "I want to delete duplicates from the splunk index which have same _raw and same _time", and, so, after finding "the data you need" thru your first query, i thought he should use |delete command.
now only the thought came to me - why delete after stats.
thanks a lot.. good learning!

0 Karma

inventsekar
Ultra Champion

adding the |delete at the end -
index=idx1 sourcetype=csv|stats count by _raw _time |where count > 1|delete

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...