Re: How to exclude duplicate Data while onboaring ...

vikram1583 · ‎04-20-2021

I have a python script with runs daily and saves output in csv file

for example: if i run that script today it will get the data from april 1st to today date(04/21/2021) and if i run tomorrow it will get the data from april 1st to tomorrow date (04/22/2021) and with different file name every time we run

i want to onboard this data into splunk with out duplicate data

how can we do that?

we have a field name called start_time this field we are taking as time field

for example: start_time field value = 04/21/2021 10.30

example: start_time field value = 04/22/2021 10.30

Thanks in advance

venkatasri · ‎04-21-2021

Hi,

Then Splunk avoids re-indexing duplicate data which is built-in, have you configured the monitors then share inputs.conf and sample data files.

venkatasri · ‎04-21-2021

Hi @vikram1583

How the data looks like in both files they change every time script runs?

Instead index both files and remove duplicates using Splunk commands like - dedup, dc etc... depends on your use case.

----------------------------------------------

An upvote would be appreciated if it helps!

vikram1583 · ‎04-21-2021

Hi @venkatasri thanks for your response. its not about only 2 files i will run that script every day if i inject those files everyday license usage will increase so i just want to inject new data

vikram1583 · ‎04-21-2021

data will be same for previous dates it just adds new data for current date

How to exclude duplicate Data while onboaring the data in below scenerio

inputs.conf

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases