Hello,
I've been searching forever and I can't seem to find the answer.
The documentations that I have found thus far have only said that it is possible to filter specific EVENTS, but not forward a simplified version of it.
Here is the issue:
Lets say I have a log that has the following format (totally made up - but you get the idea) that I send to splunk to be indexed:
--
#Fields: date time clientip User-Agent stats responsetime statuscode
2012-07-13 20:53:00 10.100.10.100 Mozilla type=something&loc=somewhere&id=11111 10 200
--
How could I parse the DATA itself, so that, say, the only thing that is forwarded is
2012-07-13 20:53:00 type=something&loc=somewhere 10 200 (Note: As per this example, I want to parse out some of the strings delimited by the space as well as substrings)
So I can't simply trim the length of the event because I need to exclude things WITHIN the event.
(the #Feilds is irrelevant as I understand it I would put this information in the transform.conf file)
All I have found is how to filter EVENTS not PARTIAL events. As you can guess I don't have control over how the log files are created, only what is given.
So the question is, can I forward minimized events to the indexers?
If so, how?
Please let me know if I need to give more information.
Yes, you can! The trick is to parse the events using a regular expression. The following example leverages a technique that is documented in Anonymize Data The regular expression defines "capture groups" using parentheses and then re-writes the raw event data with only the items that are captured.
Before you actually use this technique to index data, you might want to test the regular expression against some of your data using a tool like RegexBuddy. Or run it through a test index in Splunk.
props.conf
[yoursourcetypehere]
TRANSFORMS-t1=reformat-input
transforms.conf
[reformat-input]
REGEX = ^(.*?) Mozilla( type=.*?&loc=.*?)&id=\d+(.*)
FORMAT = $1$2$3
DEST_KEY = _raw
Last and very important - you cannot do this using a Universal Forwarder. This must be done wherever the data is being parsed. So, if you are using a Universal Forwarder, do this on the indexer. Or, use a "heavy" forwarder - it can parse the data before sending it. In either case, the full data will never be placed in the Splunk index and will not count against your Splunk license.
Awesome, thanks!
1) On the heavy forwarder itself (wherever it is). The heavy forwarder parses the input data before shipping it to the indexer.
2) Yes. The Universal Forwarder does not parse data; it simply grabs blocks of the input and sends the data to indexer for parsing and indexing.
In either case, Splunk will index the data in the new format as if it were the original.
Awesome, thanks! I'll have to check with my lead on what type of forwarding we have. So just to quickly check
1) If I am using a heavy fowarder - Do I do this on the box that I am forwarding data from, or the forwarder itself. (Editing these files specifically)?
2) To reiterate, if I am using a Universal Forwarder (though truthfully not totally sure what it is) then do it on the indexer that is in taking the data? And this will index the data in this new format (treating it as though raw?)
Thanks a lot, I'll also check out the link you sent as well.