Splunk Search

Question on how to prevent an event from being reindexed

dsadowski
New Member

I have a web application that produces a fairly complicate log structure that looks something like the following.

{ "total":6789, data:[{e1}. {e2}, {e3}] }

I have a python script that's scraping the application every few minutes to get the json out of the web app and onto the file system. The structure of the file looks something like the following.

I've been able to break the events out of the data section in the array so that splunk can index the individual {e1}, {e2}, {e3} events. The problem that I am facing is that each time that scraping script runs, I am getting duplicate events. I seem to get the same event repeated n-times until it rolls out of the log.

I think that the problem is that over time the events 'move' through the log files so it looks to splunk like the file is always changing.

Over time, the files look something like the following:
{ "total":6743, data:[{e1}. {e2}, {e3}] }
{ "total":6522, data:[{e2}. {e3}, {e4}] }
{ "total":6456, data:[{e3}. {e4}, {e5}] }

Which seems to make splunk index e3 three times.

Is there an easy way to keep Splunk from reindexing the events that it already has seen without having to get do a bunch of diffing in scripting to filter out the duplicate events?

Thanks,
Dan

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...