I setup 3 DB inputs as part of our requirement to ingest DB logs. Problem is we're encountering duplicate events upon ingesting data to splunk. I can observe the duplicate events when there's no available data to be fetched from DB, Splunk DB connect up still use the last checkpoint value and keeps re-ingesting the data until it fetch new data in DB.
Here's sample internal logs for how checkpoint value generates in 1 of the DB setup. Any idea where the issue and how to fix it.
$SPLUNK_HOME/var/lib/splunk/modinputs/server/splunk_app_db_connect
{"value":"2020-03-02 00:42:39.307","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T11:45:00.019+11:00"}
{"value":"2020-03-02 00:47:17.17","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T11:50:00.084+11:00"}
{"value":"2020-03-02 00:58:00.783","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:05:00.164+11:00"}
{"value":"2020-03-02 00:58:00.783","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:15:00.018+11:00"}
{"value":"2020-03-02 00:58:00.783","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:20:00.624+11:00"}
{"value":"2020-03-02 00:58:00.783","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:25:00.017+11:00"}
{"value":"2020-03-02 00:58:00.783","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:35:00.423+11:00"}
{"value":"2020-03-02 01:37:59.38","appVersion":"3.1.4","columnType":93,"timestamp":"2020-03-02T12:50:00.108+11:00"}
Thanks.
Has this issue been resolved, and if so, what was the solution?
My first guess:
When you set up the input, did you set it up to
WHERE abc < ? ORDER BY abc DSC
Or did you
WHERE abc <= ? ORDER BY abc DSC
This would do it, and it's the only thing I can think of off hand. You use >=
(or <=
, whatever) instead of strictly >
or <
.
Hi rich,
It's currently set as WHERE login_time > ? ORDER BY login_time
I already tried order by in DESC, but keep having duplicate events also. I implemented multiple DB inputs with diff connection type before but it's my first time to encounter duplicate events in rising column.
I'm using the MS SQL as connection type now.
Well then, second guess. The timestamp in MSSQL is higher precision than what Splunk is ... recovering? Pulling in? Receiving? Whatever term you want to use.
Can you confirm that the timestamp, when viewed in SQL Server Management Studio, is actually 2020-03-02 00:58:00.783
Because maybe it's actually, like 2020-03-02 00:58:00.783445
and the timestamp specifier on the input needs to be made more precise?
If that's not it, I may be out of simple ideas. Or maybe not, let's see what happens. 🙂