Getting Data In

Minimize an event then forward it to an indexer

cronin2004
Explorer

Hello,

I've been searching forever and I can't seem to find the answer.
The documentations that I have found thus far have only said that it is possible to filter specific EVENTS, but not forward a simplified version of it.

Here is the issue:
Lets say I have a log that has the following format (totally made up - but you get the idea) that I send to splunk to be indexed:

--

#Fields: date time clientip User-Agent stats responsetime statuscode

2012-07-13 20:53:00 10.100.10.100 Mozilla type=something&loc=somewhere&id=11111 10 200

--

How could I parse the DATA itself, so that, say, the only thing that is forwarded is

2012-07-13 20:53:00 type=something&loc=somewhere 10 200 (Note: As per this example, I want to parse out some of the strings delimited by the space as well as substrings)

So I can't simply trim the length of the event because I need to exclude things WITHIN the event.
(the #Feilds is irrelevant as I understand it I would put this information in the transform.conf file)

All I have found is how to filter EVENTS not PARTIAL events. As you can guess I don't have control over how the log files are created, only what is given.

So the question is, can I forward minimized events to the indexers?
If so, how?

Please let me know if I need to give more information.

Tags (1)
0 Karma

lguinn2
Legend

Yes, you can! The trick is to parse the events using a regular expression. The following example leverages a technique that is documented in Anonymize Data The regular expression defines "capture groups" using parentheses and then re-writes the raw event data with only the items that are captured.

Before you actually use this technique to index data, you might want to test the regular expression against some of your data using a tool like RegexBuddy. Or run it through a test index in Splunk.

props.conf

[yoursourcetypehere]
TRANSFORMS-t1=reformat-input

transforms.conf

[reformat-input]
REGEX = ^(.*?) Mozilla( type=.*?&loc=.*?)&id=\d+(.*)
FORMAT = $1$2$3
DEST_KEY = _raw

Last and very important - you cannot do this using a Universal Forwarder. This must be done wherever the data is being parsed. So, if you are using a Universal Forwarder, do this on the indexer. Or, use a "heavy" forwarder - it can parse the data before sending it. In either case, the full data will never be placed in the Splunk index and will not count against your Splunk license.

cronin2004
Explorer

Awesome, thanks!

0 Karma

lguinn2
Legend

1) On the heavy forwarder itself (wherever it is). The heavy forwarder parses the input data before shipping it to the indexer.

2) Yes. The Universal Forwarder does not parse data; it simply grabs blocks of the input and sends the data to indexer for parsing and indexing.

In either case, Splunk will index the data in the new format as if it were the original.

0 Karma

cronin2004
Explorer

Awesome, thanks! I'll have to check with my lead on what type of forwarding we have. So just to quickly check

1) If I am using a heavy fowarder - Do I do this on the box that I am forwarding data from, or the forwarder itself. (Editing these files specifically)?

2) To reiterate, if I am using a Universal Forwarder (though truthfully not totally sure what it is) then do it on the indexer that is in taking the data? And this will index the data in this new format (treating it as though raw?)

Thanks a lot, I'll also check out the link you sent as well.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...