Hello all,
Just wondering if some of you have already indexed some logs which are stored in a DB and if so, what would be the best practice.
The application is not logging into a file but in the DB. I'd like to index it as well in Splunk and I've seen that it can be done via scripting. The problem I encounter is I have either missing data or duplicated data. Indeed, If I run for instance every 10 minutes and collect the data for the last 10 minutes, it might happen that I miss a few milliseconds. If I decide to run the script every 9 minutes but collect the data for the last 10 minutes, I end up with duplicated data.
Any idea on how to get this working?
Regards,
Olivier
I would suggest that you alter your script in such a way as to make it pick up where it left off the last time. Reference the row_id, sequence number, or timestamp where you last left and grab anything greater than this value.
http://www.splunk.com/wiki/Apps:DatabaseCollection provides suggestions for this:
If I had a table (a query result, actually) that looked like
seqno, time, message
then I could do this:
oldmax = readmaxfile
max = select max(seqno) from table
select * from table where seqno > oldmax
writemaxfile (max)
for each of the returned results:
format nicely ( kv pairs work well here! )
write to stdout
I would suggest that you alter your script in such a way as to make it pick up where it left off the last time. Reference the row_id, sequence number, or timestamp where you last left and grab anything greater than this value.
http://www.splunk.com/wiki/Apps:DatabaseCollection provides suggestions for this:
If I had a table (a query result, actually) that looked like
seqno, time, message
then I could do this:
oldmax = readmaxfile
max = select max(seqno) from table
select * from table where seqno > oldmax
writemaxfile (max)
for each of the returned results:
format nicely ( kv pairs work well here! )
write to stdout
Exactly what I wanted 🙂 thx!!