All Apps and Add-ons

Is there a way to impliment data integrity control on hadoop archives?

jcampbell1977
Explorer

We are going through some compliance stuff and I need to ensure that our data integrity is true. How would I go about doing this on a virtual index? We are using hadoop to read the data from s3 in AWS.

dwaddle
SplunkTrust
SplunkTrust

You really can't use the Splunk Data Integrity feature with Hadoop virtual index data. Splunk did not process the data through its own indexing pipeline, so you can't use the Data Integrity features of the indexing pipeline. You would have to make your own file hashing and signing solution for HDFS and make sure it crosses all of your compliance check boxes.

0 Karma

jcampbell1977
Explorer

Well, the data did go through splunk first. We are rolling our data off and using AWS s3 as our archiving solution and using analytics for hadoop to read through splunk.
Are you aware of any file hashing components for hdfs?

0 Karma

dwaddle
SplunkTrust
SplunkTrust

OH well that changes it a little bit I guess! If the data went through Splunk's Data Integrity features as it was indexed before it wound up in HDFS, then the hashes made when the bucket rolled from hot to warm should still be valid on the copy of the bucket in s3. The question becomes if there is a way to check those. And I'm honestly not sure.

0 Karma
Get Updates on the Splunk Community!

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...