Splunk Search

Remove 99% of Data from a file with Transforms.conf

robertlynch2020
Motivator

I have data coming into SPLUNK [service] , but i only need the file name not the data in the file.

The data is getting in, but i need to reduce it.

So i am trying to reduce the data with REGEX before it hits the INDEX. For example i think i would have to take one character from each file so it will register the file and i can use the file name. I have the below REGEX but its not working. Any ideas?

transforms.conf
[NoInfo_100]
REGEX = .$
DEST_KEY = queue
FORMAT = nullQueue

props.conf
[service]
TRANSFORMS-filter = NoInfo_100

Thanks in Advance
Robert Lynch

0 Karma
1 Solution

micahkemp
Champion

Your configuration above routes lines to nullQueue, and therefore would skip indexing entirely. That's not what you described you're looking for. The below should rewrite the log lines to just the first character of the line.

[NoInfo_100]
REGEX = (.)
DEST_KEY = _raw
FORMAT = $1

View solution in original post

0 Karma

DalJeanis
Legend

If there is a header, or any other record that will ALWAYS be there in small but nonzero numbers, then use @mayurr98's solution to route everything but the header to the nullqueue.

If there is no qualifying type of records, then perhaps one character per record might be the best you can do.

0 Karma

mayurr98
Super Champion

This is done by defining a regex to match the necessary event(s) and send everything else to nullqueue

Here is a basic example that will drop everything except events that contain the string login

props.conf

[source::/var/log/foo]
 # Transforms must be applied in this order
 # to make sure events are dropped on the
 # floor prior to making their way to the
 # index processor
 TRANSFORMS-set = setnull, setparsing

In transforms.conf

[setnull]
 REGEX = .
 DEST_KEY = queue
 FORMAT = nullQueue

[setparsing]
 REGEX = login
 DEST_KEY = queue
 FORMAT = indexQueue

Let me know if this helps!

micahkemp
Champion

Your configuration above routes lines to nullQueue, and therefore would skip indexing entirely. That's not what you described you're looking for. The below should rewrite the log lines to just the first character of the line.

[NoInfo_100]
REGEX = (.)
DEST_KEY = _raw
FORMAT = $1
0 Karma

robertlynch2020
Motivator

This is really great guys, thanks.

Do you think it is possible to take in only one character per file, not per line?

In the perfect world i just want to look at the data in the filename, so the data inside the file is not usefull.

0 Karma

robertlynch2020
Motivator

Got this by adding

[service]
BREAK_ONLY_BEFORE=ererererererer
TRANSFORMS-filter = NoInfo_100

0 Karma

jkat54
SplunkTrust
SplunkTrust

Would it be easier just to run a script that prints out the file names in this directory?

Your current regex would remove the last character of each line in the file.

I really think you should use a script to list the names instead of using the indexing pipeline to transform the data.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...