All Apps and Add-ons

Splunk DB Connect 2: Is there a way to tell how many records were imported per poll, and can we go back an import missing records?

bworrellZP
Communicator

Using Splunk DB Connect 2 to pull in data from SQL. Noticed a few things wrong with the data, so made some adjustments, which I then noticed other issues.

So issue one, I am importing data from a server that syncs its data from prod. Due to this, I may be missing data. So, is there a way to tell how many records that DB Connect imported per poll? Is there a way to go back and import any missing records?

If that is an error, is there a way to run DB Connect at a certain time of day to pull all new data since yesterday?

Of potential connection, I moved our rising column to use DateofAccess, versus another column. Read that this is a bad idea. The post did not say why though. Is it okay to use that or should I use a different column?

Thanks
Brian

0 Karma
1 Solution

Richfez
SplunkTrust
SplunkTrust

So serverB syncs data from production's DB, then you use DB Connect v2 to grab the data from serverB. But serverB's copy of the data is incomplete? Your description of how it's missing isn't very clear and you may need to clarify if nothing we come up with fixes this, but still, I have a few thoughts.

Fix your syncing. If serverB's copy of the data is complete up to a certain time, this isn't an issue and it shouldn't be an issue for Splunk, either. If, though, serverB's copy of the data has "gaps" in the inside, then there's something wrong with your method of syncing and that should be fixed.

If there are gaps like those described above that you can't fix with better/different syncing, there are two ways I can think of to get around that. Delete all your data each time and have the input into Splunk always grab the entire table each time. This can be brutal to a license, though.

Another thought would be to have the destination tables have a date/time auto-filled column that sets to the time of row creation in the synced DB, then you can use that column as your rising column. You might need to work with your DBA to get that sorted out. The idea being each row, when it gets added to the synced/destination DB, has that field fill in with the current time when that row got added. Then you pull based on that as your rising column so it will, each time, only grab items that are "new" to the synced DB.

Alternatively, why can't you use a read-only connection straight to the production DB? I mean, sometimes you can't, but sometimes the only reason you can't is because you haven't asked. 🙂

View solution in original post

Richfez
SplunkTrust
SplunkTrust

So serverB syncs data from production's DB, then you use DB Connect v2 to grab the data from serverB. But serverB's copy of the data is incomplete? Your description of how it's missing isn't very clear and you may need to clarify if nothing we come up with fixes this, but still, I have a few thoughts.

Fix your syncing. If serverB's copy of the data is complete up to a certain time, this isn't an issue and it shouldn't be an issue for Splunk, either. If, though, serverB's copy of the data has "gaps" in the inside, then there's something wrong with your method of syncing and that should be fixed.

If there are gaps like those described above that you can't fix with better/different syncing, there are two ways I can think of to get around that. Delete all your data each time and have the input into Splunk always grab the entire table each time. This can be brutal to a license, though.

Another thought would be to have the destination tables have a date/time auto-filled column that sets to the time of row creation in the synced DB, then you can use that column as your rising column. You might need to work with your DBA to get that sorted out. The idea being each row, when it gets added to the synced/destination DB, has that field fill in with the current time when that row got added. Then you pull based on that as your rising column so it will, each time, only grab items that are "new" to the synced DB.

Alternatively, why can't you use a read-only connection straight to the production DB? I mean, sometimes you can't, but sometimes the only reason you can't is because you haven't asked. 🙂

bworrellZP
Communicator

Good points. So with the Prod server, its a Data warehouse, doing a lot of things, pulling data from lots of other places, jobs, etc. With that load, they want to sync data to the Read Only Data warehouse. My gut tells me the sync job does not replicate in date time order but in some other blocks. They sync job seems to take different lengths of time to complete.

Thinking I should change my import to a cron job, and pull data with query that is d-1. then I should not have an issue.

0 Karma

Richfez
SplunkTrust
SplunkTrust

If you don't need absolutely up to the moment data, doing "laggy" import like that could certainly do the trick.

Let us know what you finally figure out!

0 Karma

bworrellZP
Communicator

Did testing yesterday and had a job run this morning via Cron. Appears that I may always get some duplicate data, due to how its pulled to the warehouse. Talking maybe 20 to 1000 records out of 250k pulled in per night. That said since there is a unique key, I can just dedup it on the searches and dashboards. The Cron worked though.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...