Solved: Is it possible to dedup data during indexing?

tylr · ‎02-19-2011

I'm feeding splunk a large quantity of historical gzipped syslog files for many, many different machines through a single TCP listener input. These archived files almost certainly contain overlapping data. Furthermore, new data may come in that overlaps with the old data. I can filter my search results to not show that duplicated data, but is it possible to strip any duplicate lines at index time?

Stephen_Sorkin · ‎02-19-2011

No, that is not possible.

View solution in original post

ncsantucci · ‎05-23-2014

Similar scenario with logrotate compressing and rotating logs see http://answers.splunk.com/answers/121267/how-does-splunk-handle-nix-logrotate-based-log-rotation

Stephen_Sorkin · ‎02-19-2011

No, that is not possible.

Is it possible to dedup data during indexing?

Index This | Forward, I’m heavy; backward, I’m not. What am I?

A Guide To Cloud Migration Success

Join Us for Splunk University and Get Your Bootcamp Game On!