Good afternoon fellow splunkthiasts, I need your help with data anonymization.
Situation: Application on server with UFW produces a log - most of it is a boring operational stuff, however certain records contain a field considered to be sensitive. Log records are necessary for ordinary Ops admins (who need to see all records, but don't need to see the actual sensitive field value) and privileged troubleshooters, who need to see the sensitive data, too.
Architecture: data is produced on a server with UFW, will be stored on indexer cluster and there is one heavy-forwarder available in my deployment.
Limitations: 1. Due to limited bandwidth between UFW and Splunk servers, it is preferred not to increase volume of data transferred from UFW (bandwidth between HFW and indexers is fine). 2. Due to time-constrained validity of the sensitive field, delays introduced by search->modify->index again every few minutes are not acceptable. 3. Indexing the sensitive records twice is OK. Indexing whole log twice would be too expensive fun.
Proposed solution: UFW will forward the log to heavy-forwarder where it should be duplicated. One copy of the data should be anonymized and forwarded to index "operational", while the other one should be filtered (only records with sensitive field are kept) and then forwarded to index "sensitive".
Problem: I know how to route data, how to anonymize data, how to filter data before routing, but I am not sure how to connect the dots in described manner. To be specific, I don't know how to duplicate the data on HFW and make sure each copy is treated differently.
Can you help, or possibly propose some better solution?
... View more