I want to start tailing a large table in MS SQL. I have done other tails with DB Connect, but they were small enough that the first run could pull the whole table and not create a problem. However, the one I'm trying to tail is very large, and contains older rows that I don't need. In order to do this, I think I need to do the following:
1) Start the rising_column number at an id from a few months ago
2) Limit the first run and subsequent runs to something like 10,000 rows
Is this possible using DB Connect?
Here's how I do this.. I just put at the end of the query Where myColumn > 234234
you will need to put And inside the brackets as below.
Do not put the "Where myColumn > 234234" inside the {{ and rising_column...}} .. leave outside.. and if it's your rising column, then it will always be bigger than the myColumn value so it doesn't hurt to leave it in there.
So it would be Where myColumn > 234234 {{ and rising_column > ?}}
Also you can always put "select TOP 10000" on your query and order by your ID column ascending
make sense?
Here's how I do this.. I just put at the end of the query Where myColumn > 234234
you will need to put And inside the brackets as below.
Do not put the "Where myColumn > 234234" inside the {{ and rising_column...}} .. leave outside.. and if it's your rising column, then it will always be bigger than the myColumn value so it doesn't hurt to leave it in there.
So it would be Where myColumn > 234234 {{ and rising_column > ?}}
Also you can always put "select TOP 10000" on your query and order by your ID column ascending
make sense?
That makes sense, thanks for the answer. However, I had a pressing commitment to get this data indexed. So, I brewed up a Python script to do the same thing. I would like to get DB Connect working in the future, as it seams easier.
Yes, you can do 1, but I don't see much point to doing 2.
Instead of rising_column > ?
change the ?
to an epoch timestamp of the datetime of your choosing.
Alternatively, you could use the ? to initialize the database input, and then immediately stop splunk and update the state.xml to your choosing - see this post:
http://answers.splunk.com/answers/91112/dbconnect-value-of-risingcolumn
I want to space out the draws. If there are tens of millions of rows from the start date til now, having Splunk pull all of them at once is going to cause trouble for my app.
Yea, but that just delays things. If you say to the indexer only get 10000 rows, you're saying to the indexer "only get 10000 at a time". It will still collect all rows from the rising column value.
I have seen Splunk appear to stop indexing when I index an new db with a dozen GB of backlog, but it wakes up on its own after a short time.
The point of number 2 is to keep DB Connect from overtaxing my DB server and indexing more data than our license permits. Even if I set the beginning id, pulling the last three months worth of rows in one shot would slow our production DB server to a crawl.