Getting Data In

event duplication every time I update my Json input file. Can I avoid or do I even need to care (ie I can 'dedup' my searches)?

patrickfeerick
New Member

I am a new user to Splunk and had the assumption that when using a json file as a continuously monitored datasource Splunk would only index the changes to that file. However in fact the event count in the index more or less doubles even if I so much as append a singIe new jason record - ie all entries in the file appear to be re-indexed, and I get massive event duplication, which of course complicates my searches and I guess may even 'run out of road' once the file starts to get big? I am kind of perplexed why splunk would do this - surely it can recognise that it is duplicating the event, or are there situations where this behaviour might actually be desirable?
I can see that similar questions have been asked but I have yet to see a clearcut answer. Is there a specific combination of (say) props.conf that will fix this? ( I didn't touch it yet, leaving it at the default).
If pushed I could identify the changes everytime the source updates and instead only present a change-set to splunk. This would be last resort of course 🙂
Thanks for your time
PF

0 Karma

patrickfeerick
New Member

Thank you. In fact I am not changing the existing content of my jason file, i am always appending new records. But the append is only a logical append, there are scenarios ( for me) where new events might get inserted somewhere before the end of the file. I think this is the core of the issue, I had a notion that splunk might index based on the logical changes to a jason file ( using keys, values) but based on what you are saying it appears that it does not work this way. Normally my file updates are inserted to the end of the file anyway, so I'll certainly try out the tail scheme. Thanks for that pointer.

0 Karma

DalJeanis
Legend

Basically, you can tell Splunk what you want it to do. Splunk will happily keep track of where it left off in a file, and only index things added after that point. Or, it will happily index the entire file each time it changes. CHANGING a file, though, is not going to work very well. Splunk is designed as an add-only system, so there's no way to update a record that was previously indexed... you have to delete the existing record and then newly index the replacement.

Yes, there are valid reasons that someone might have otherwise identical records, which only differ by the timestamp of when they were indexed. It's not that common, but it's not unheard of either.

The solution is to let splunk tail the file. Take a look at this discussion of the followTail parameter...

https://answers.splunk.com/answers/1036/is-there-a-way-to-tail-a-file-to-index-any-new-changes.html

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...