Hi.
Just installed Splunk for the first time today. As a tes,t I took a CSV file and indexed it, and it worked fine. Then I created a new file in CSV format and gzip'ed it.
test.csv.gz
field,val
blah,whatever
It indexed fine. I then edited the file using vi, adding in a new line :
newfield,morestuff
I then and then searched the results again. Now the "newfield,morestuff" shows up once in the results, but "blah,whatever" shows up twice. I tried adding more lines and saw the same pattern - the most recent line shows up once, but the older lines are duplicated in the search results.
I then added | dedup _raw
to the search and the duplicates went away. However, I'm looking for a more elegant solution.
By the way, I also tried unzipping the file, editing it, then gzipping it again, with the same results.
Thanks for your help!
It sounds like you were using the "upload" method of adding data to Splunk, which will result in the duplicates the way you've described it. A better way would be to have Splunk monitor your CSV file for changes (Add Data - Monitor - Files & Directories.) That way, you can make as many changes as you want to your CSV file without having to re-upload it, and Splunk will only detect and index any changes you've made.
Thanks for the response. Actually I was already using the "continuously monitor" option that you recommend. I definitely don't want to re-upload my files. As I said, this does work well for plaintext csv files but it leads to duplication for gzipp'ed csv files.