Splunk Search

How to mask a field at search time only if the data is > 30 days?

lyndac
Contributor

I have a requirement to mask the value of a field after 30 days.

The events are json events. The users need to be able to see/search all the fields except 1 for up to a year. The 1 field must be hidden from view after 30 days.

My plan was to define a calculated field that, when _time is more than 30 days ago overwrites the value of the field with one I supply. The calculation would be performed for every search. What I failed to consider was 2 things:

First, The field to be overwritten is a json field. The fieldname is foo{}.id If I use
|eval foo{}.id = if ((_time < (now() - (86400*30))), "TOO OLD", foo{}.id), I get an error that the eval is malformed. If I add quotes around the field names like this: |eval "foo{}.id" = if ((_time < (now() - (86400*30))), "TOO OLD", "foo{}.id"), I get a new field called foo.id which = TOO OLD, but I still have the original foo{}.id with the original value.

Second, Even if I can get the calculated field to behave properly, the original value is still in the _raw field which is easily visible in the events view or by adding _raw to a table.

So, is it possible to overwrite a single field at search time such that every search will return the overwritten value?

Also, can I somehow remove the _raw field for every search, and if so, are there any weird consequences from doing that?

0 Karma

woodcock
Esteemed Legend

I would do this: At the time of index, modify the event to create a hash using the time-sensitive field and replace the field value in the raw event with the hash. At the same time, add the value with the hash and a date in a KV store so that the data exists in 2 separate places. Then every day purge the KV store of any data that is older than 30 days. When you search, use a lookup on the hash in the event to pull in the field value from the KV store and after 30-days, the lookup will fail.

lyndac
Contributor

This sounds like a great approach. So, I'd need a script to pre-process the data files before they are given to the splunk Universal Forwarder, right?

0 Karma

woodcock
Esteemed Legend

You've got it.

0 Karma

woodcock
Esteemed Legend

You will need to re-index the event after modifying it and the delete the original event. You can use collect to do this.

0 Karma

lyndac
Contributor

I saw a reference to this solution in another answer, but didn't understand it. I thought summary indexes were mainly used to collect the output of stats commands so you can keep counts longer than the actual data. How does a summary index work when you just want to re-index an entire event that is already indexed? Does it just send the _raw field value through the index/parsing pipeline again? if so, do I just need to use |rex to mask the field in the raw json?

Are the same props and transforms applied to the summary indexed data that is applied to the original data? I want to make sure that I can just add the summary index to all of my searches and have them still work.

Any details you can give me would be greatly appreciated. I'd really like to more fully understand how this works.

Thanks...

0 Karma

woodcock
Esteemed Legend

Although collect is intended to write to a Summary Index, in actuality, it can write to any Index. Play around with it and you will see what it does.

|noop|stats count AS TestOfCollect | collect index=myIndex

Then check it out:

index=myIndex | where isnotnull(TestOfCollect)

Then throw it away and refine:

index=myIndex | where isnotnull(TestOfCollect) | delete
0 Karma

woodcock
Esteemed Legend

Be aware that using collect to a non-Summary Index will incur double-license hit.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...