given this portion of a log record...
0000029200000004A2460570312310/01/1380689725o4 2.0 0140200002100121552538 (...)
................============................==.......========............
I need a way to index the portions of the log record that are above the equal-signs shown below the record. Thus far, I've seen that Splunk is adept at indexing things that are on word boundaries, unfortunately, these aren't.
It is possible to use a sed command (SEDCMD) in props.conf to manipulate records before indexing. Similarly, it is relatively easy to manipulate records prior to indexing by stitching together capture groups in transforms.conf.
While those options are available they should be used sparingly. The reason is that extracting fields at search time provides much greater flexibility. Instead of baking your decisions in while indexing, Splunk allows you to extract fields at search time without re-starting services or re-indexing data. This is hugely beneficial if you discover you needed another field or piece of data a month later -- or if the format changes upstream from your area of influence.
0000029200000004A2460570312310/01/1380689725o4 2.0 0140200002100121552538 (...)
................============................==.......========............
As lukejadamec points out, this is going to be a regex based extraction. Assumptions;
1) fixed length of what to skip, and what to extract
2) this is the start of the event
3) single line event
in props.conf
[your_source_or_sourcetype]
EXTRACT-blah = ^.{16}(?<field1>.{12}).{16}(?<field2>..)\s+\S+\s+..(?<field3>.{8})
I'm not sure, but you may have to make an addition to fields.conf as well. See;
Hope this helps,
K
Actually, splunk is adept at finding things that can be recognized with regex. In other words, If you can tell splunk what makes what you want to keep unique, then it will find it for you.