Knowledge Management

How do create indexed fields in a summary index?

Lowell
Super Champion

I'm populating a summary index with data that I would like to be able to search very quickly using tstats. I've got this mostly working but can't quite seem to figure out if I'm doing something wrong or why it isn't working as expected.

Summary index generating search: search_foo
Fields to index: a, b, c, d
I want to be able to write a search like this: | tstats sum(a), sum(b), values(c) WHERE index=summary source=search_foo by d

Here are the settings I'm trying to make work:

props.conf:

[source::search_foo]
TRANSFORMS-index-fields = search_foo_indexfields

transforms.conf:

[search_foo_indexfields]
REGEX = \b(a|b|c|d)=("?)([^"]*?)\2(?:,|$)
FORMAT = $1::$3
WRITE_META = true
REPEAT_MATCH = true

I know that I have all the names and meta settings correctly because the first field does get added as an indexed field. (I confirmed this by running exporttool -csv on one of the buckets and confirmed that the field showed up in the _meta field. Splunk seems to be ignoring the REPEAT_MATCH setting.

So as a workaround, I've made REGEX match all 4 fields directly and index them all at once. (e.g., FORMAT = a::$1 b::$2 c::$3 d::$4) This works, but I really don't like the approach because it assumes a hard-coded order of the fields, which seems unnecessarily fragile. In my actual use case, sometimes "a" or "b'' is missing from the data. I've been able to make the regex cope with that fact, but that still results in an empty indexed field. (In other words, if "b" is missing form the data, I still see b:: in _meta when I run exporttool.) I also considered making 4 transforms entries, one for each field, but that seems silly as well.

Bonus question: Here's one somewhat related question, how to I avoid double escaping backslashes in my solution. One of my actual fields a "source", so Window's paths show up in the raw data with escaped backslashes ( \\ ) which gets translated to double escaped ( \\\\ ) in the _meta field, which then means that at search time, the indexed fields look like "C:\Windows\.." instead of "C:\Window...".

0 Karma

woodcock
Esteemed Legend

Double-check that the source value for the data in your Summary Index matches your stanza header specification.

0 Karma

woodcock
Esteemed Legend

Many people do not know about _KEY_1 and _VAL_1 (you can search on it). Try this:

[search_foo_indexfields]
REGEX = \b(?<_KEY_1>a|b|c|d)=("?)(?<_VAL_1>[^"]*?)\2(?:,|$)
WRITE_META = true
MV_ADD= true
0 Karma

Lowell
Super Champion

Okay, so this adds a new field with the name of the transforms stanza ("search_foo_indexfields") with the value of either "a" or "b".

Just confirmed it in the _meta field dumped out with exporttool. "... date_mday::25 date_zone::0 search_foo_indexfields::a"

From the docs, it's not 100% clear if _KEY_x and _VAL_x is supported at index time, but it doesn't seem to be working.

0 Karma

woodcock
Esteemed Legend

You have to deploy these configurations to the INDEXING SERVER. In most cases this is your indexers HOWEVER in the case of Summary Indices, by default (unless you went out of your way to change it), these are stored on the SEARCH HEAD so you will have to EITHER deploy the configurations to the Search Head OR make sure that Summary Indexing happens on the Indexers.

0 Karma

Lowell
Super Champion

I assume _KEY_! is a typo for _KEY_1? I was aware of that syntax, but didn't think it held any advantages here. (But I'll give it a try.) I haven't tried MV_ADD as the docs say, "This attribute is only valid for search-time field extractions."

0 Karma

woodcock
Esteemed Legend

Yes, fixed.

0 Karma

somesoni2
Revered Legend

How about creating separate TRANSFORMS stanza for each field, so that even if one field is missing, the other show up independently?
For double escaping, may be try applying some command in the summary index search to remove escaped backslash.

0 Karma

Lowell
Super Champion

I'd like to avoid on transforms stanza per field if possible. My real use case has more than just 4 fields. (Not an unmanageable number, just seems like the has to be a better solution.)

I'm pretty sure the backslash escaping is happing automatically by the summary indexing plumbing commands (I'm just using the defaults builtin alert actions for summary indexing) And in fact, I'm already dealing with escaped backlashes in part of my search, so I know the've been taken care of in my base search.

And yes I could remove them at search time, but since I'm in control of the data generation, it seems silly to deal with something in every search I write, if I could fix the issue once when the data is written.

0 Karma

somesoni2
Revered Legend

Give this a try?

[search_foo_indexfields]
 REGEX = \b(?<_KEY_1>(a|b|c|d))=("?)(?<_VAL_1>[^"]*?)\2(?:,|$)
 WRITE_META = true
 REPEAT_MATCH = true
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...