I'm attempting to minimize the amount of data Splunk indexes, but i'm dealing with very large log files. At the moment I can filter the events in these logs based on a regex search to only return the events that I need, however I'd like to shrink the indexed data even further, capturing only one field from the event. This is some contents from a typical log file:
00:22:15.911 - M:ReadByID TradeMeOrganisationWorker,D:0ms,C:1,S:
00:22:32.119 - M:ReadMultiple vwTMNewAutoListing,D:7427ms,C:0,S: at LTI.Services.Concrete....
00:22:34.397 - M:ReadMultiple vwListingQuestion,D:32ms,C:0,S:
The Bold event is the specific M type I'd like to capture (i.e. my Regex search is 'ReadMultiple vwTMNewAutoListing'), however the only information i'm interested in is the Duration (i.e. D:), and discard all the rest of the data - the S: field is a stack-trace and can be quite large.
This is my current config:
PROPS.conf
[hostfile]
pulldown_type = true
SHOULD_LINEMERGE = False
CHECK_FOR_HEADER = false
TRANSFORMS-set = setnull,newautolisting
REPORT-extract = durationMS
TRANSFORMS.conf
[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue
[newautolisting]
REGEX = ReadMultiple vwTMNewAutoListing
FORMAT = indexQueue
DEST_KEY = queue
[durationMS]
REGEX = ReadMultiple vwTMNewAutoListing,D:(?<DurationMS>\d+)ms
FORMAT = DurationMS::$1
DEST_KEY = queue
As you can see, I'm sending all the events that do not match the Regex to Null, however the end result is I capture the complete Raw event that matches 'ReadMultiple vwTMNewAutoListing', plus I create the DurationMS field at search time. Is it possible to create the DurationMS field at index-time and discard the rest of the Raw event?
... View more