All Apps and Add-ons

Sanitize already indexed data

mslvrstn
Communicator

There is some data that we want to sanitize in Splunk. I've already got a SEDCMD to do it for newly indexed data, but is there some way to modify the events that have already been indexed in Splunk. At worst, I will delete the events, but ideally I would like to just XXX out a specific field.

Tags (2)

davidpaper
Contributor

Hi,

Data in Splunk is indeed immutable. This doesn't mean that with a little work, that the data can't be cleaned up and made available for search without the PII data in there.

0) You already nailed part of the solution: SEDCMD to keep the problem from getting worse for new data indexed.
1) Create a search that finds all the events with the PII data in it that needs to be cleansed. Run that on the ./splunk CLI and dump the results to a file.
2) Use your favorite text mangling tools (sed, awk, perl, LISP 🙂 ) to sanitize the data on disk.
3) Run the original search again, this time with '... | delete' at the end, to mark the existing entries as unavailable for being included in search results.
4) Use ./splunk add oneshot to re-index the sanitized data file.

5) enjoy a frosty beverage of your choice for a job well done.

The original search & |delete may take quite a while, depending on how many events need to be found & extracted. The oneshot will slurp it back in as fast as the forwarder/index can absorb it. Note that oneshot WILL count against the license, so plan accordingly.

0 Karma

Jason
Motivator

As far as I know, once it is indexed, it is immutable. You can restrict access to the data via a role's search strings, and you can use | rex mode=sed ... to hide data at search time. Perhaps combine both to enforce a sed for a particular role?

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...