I have customers who upload sets of files every day. The upload is done automatically. Sometimes there will be a hitch in the system and one or more of the file set will be uploaded multiple times. The file names all have the term _seq_
followed by a sequence number. So part of the customer events will look like this:
abcdef_seq_1
abcdef_seq_2
abcdef_seq_2
abcdef_seq_3
abcdef_seq_4
I only want to show only the duplicated upload files, in this case abcdef_seq_2
. It shouldn't be that hard but I'm busting my head. What am I missing?
Ultimately I need to put this into a data model for a Pivot.
I think I finally figured it out. This search returns only those IIS events that have duplicate cs_uri-query fields.
sourcetype="iis" cs_uri_query="*_seq*"
| stats first(cs_uri_query) as DupFile, first(cs_username) as Customer, count(cs_uri_query) AS Duplicates by cs_uri_query
| where Duplicates>1
| table Customer, DupFile, Duplicates
I think I finally figured it out. This search returns only those IIS events that have duplicate cs_uri-query fields.
sourcetype="iis" cs_uri_query="*_seq*"
| stats first(cs_uri_query) as DupFile, first(cs_username) as Customer, count(cs_uri_query) AS Duplicates by cs_uri_query
| where Duplicates>1
| table Customer, DupFile, Duplicates
ps : please mark your question as answered with the left checkbox to accept your own answer 🙂
this is the good method.
to find a dulpicate field
* | stats count by myfield | where count>1
to look at the whole events
* | stats count by _raw | where count>1
In splunk, do you see duplicate data for the files uploaded multiple times?