Splunk Search

Performantly overriding sourcetype per event with new replacement string, not backreference?

Graham_Hanningt
Builder

I know how to use Splunk 7.3.0 to overrride source type per event using a backreference. For example, given this snippet of incoming JSON Lines:

"code":"red"

I can do this in transforms.conf:

REGEX = \"code\":\"([^\"]+)\"
FORMAT = sourcetype::$1
DEST_KEY = MetaData:Sourcetype

Code "red" in the incoming JSON Lines event data sets the event source type to "red".

But suppose I don't want to use the value of code as the sourcetype? Suppose I want to map each code value to a completely different sourcetype value? Perhaps each incoming code value uniquely identifies a different source type, but the actual code value is not Splunk-y enough to be a sourcetype value? Although, I don't want to get into sourcetype naming conventions here.

The only way I have thought of doing this so far is to create a stanza for each code value. For example, in transforms.conf (these code and sourcetype values are fictitious):

[set_sourcetype_test_red]
REGEX = \"code\":\"red\"
FORMAT = sourcetype::scarlet
DEST_KEY = MetaData:Sourcetype
[set_sourcetype_test_green]
REGEX = \"code\":\"green\"
FORMAT = sourcetype::emerald
DEST_KEY = MetaData:Sourcetype
[set_sourcetype_test_blue]
REGEX = \"code\":\"blue\"
FORMAT = sourcetype::aqua
DEST_KEY = MetaData:Sourcetype

and in props.conf:

TRANSFORMS-changesourcetype = set_sourcetype_test_red, set_sourcetype_test_green, set_sourcetype_test_blue

Codes "red", "green", and "blue" become source types "scarlet", "emerald", and "aqua".

I don't like this multi-stanza technique. I currently have only half a dozen or so source types in this context, but I might end up with many more.

Can anyone suggest a more concise, more performant technique; say, a single stanza with a single regex? I can't see how to do it.

For the purposes of this question:

  • The different code values are all arriving at the same Splunk input (for example, TCP port)
  • I know what all the code values are (although, a fallback transform that uses a backreference for unexpected code values would be useful)

I notice that the Splunk docs contain the PCRE2 license, but the transforms.conf docs don't appear to mention any PCRE2-specific functionality, and anyway, I'm not even sure whether PCRE2-level substitution features would be of help here.

1 Solution

woodcock
Esteemed Legend

You could use INGEST_EVAL with a case statement to facilitate this.

View solution in original post

0 Karma

Graham_Hanningt
Builder

I've just submitted the following feedback on the Splunk 7.3.0 docs page for transforms.conf:


I've seen that Splunk docs cite the PCRE2 license, so I'd hoped that regex replacement in transforms.conf would support PCRE2 replacements. Apparently not :-(, hence this feedback.

The following settings:

[set_sourcetype_test_pcre2]
REGEX = \"code\":\"(?<red>red)|(?<green>green)|(?<blue>blue)|(?<other>[^\"]+)\"
FORMAT = sourcetype::${red:+scarlet:}${green:+emerald:}${blue:+aqua:}

with input JSON Lines snippet such as:

"code":"red"

results in a sourcetype value of, literally:

${red:+scarlet:}${green:+emerald:}${blue:+aqua:}

That is, regex processing in Splunk appears not to recognize the PCRE2 replacement syntax.

Or perhaps I'm doing something wrong.

Here's what I want to happen: if the code property value is "red", then set sourcetype to "scarlet"; if code "green", set sourcetype "emerald"; if code "blue", set sourcetype "aqua".

For more details, see my related question in Splunk Answers, "Performantly overriding sourcetype per event with new replacement string, not backreference?".


By "doing something wrong", I mean, for example: if the named capture group "red" is unset, then I want the replacement value to be an empty string, hence the lack of a string after the second colon; however, I'm unsure whether PCRE2 allows this; whether I need to specify "something" as the replacement string.

0 Karma

Graham_Hanningt
Builder

A Splunk docs contact has responded to my feedback (thank you!), and confirmed that, as of Splunk 8.0.0, Splunk doesn't support functions specific to PCRE2, such as these substitution functions.

0 Karma

woodcock
Esteemed Legend

You could use INGEST_EVAL with a case statement to facilitate this.

0 Karma

Graham_Hanningt
Builder

Yes!

This works:

[set_sourcetype]
INGEST_EVAL = sourcetype:=case(match(_raw, "\"code\":\"red\""), "scarlet", match(_raw, "\"code\":\"green\""), "emerald", match(_raw, "\"code\":\"blue\""), "aqua", true(), "other")

Thank you for your answer. My apologies for this belated comment.

I don't like the repetition of match(_raw, ... ) in my case function, though.

Here's a variation that extracts the code value into sourcetype in one transform, and then refers to that "temporary" sourcetype in the INGEST_EVAL in a second transform:

[get_sourcetype_from_code]
REGEX = \"code\":\"([^\"]+)\"
FORMAT = sourcetype::$1
DEST_KEY = MetaData:Sourcetype
[set_sourcetype]
INGEST_EVAL = sourcetype:=case(sourcetype=="red", "scarlet", sourcetype=="green", "emerald", sourcetype=="blue", "aqua", true(), sourcetype)

(Requires props.conf to refer to the two transforms in sequence. For example: TRANSFORMS-changesourcetype = get_sourcetype_from_code,set_sourcetype.)

woodcock
Esteemed Legend

VERY nicely done! I like it.

0 Karma

Graham_Hanningt
Builder

Incidental observation: the example set_sourcetype stanza in my previous comment (deliberately) doesn't specify a REGEX setting. splunkd reports this omission as an error:

ERROR regexExtractionProcessor - REGEX field must be specified tranform_name=set_sourcetype

My opinion: this error is a bug. In practice, a REGEX is not required for this stanza.

Nit: Splunk, please correct the typo tranform_name (sic; note the missing "s") in the error text.

0 Karma

Graham_Hanningt
Builder

Perhaps I'm trying too hard to be Splunk-y by attempting to map each of these incoming code values to a different sourcetypevalue. I could simply forget about overriding the source type per event, set a fixed sourcetype, and, in my searches, where I currently refer only to sourcetype, refer instead to both sourcetype and code. (I didn't mention this in the question, but I typically use a transform to remove code after using it to override sourcetype.) I typically place such search snippets in macros, anyway, to isolate my dashboard Simple XML from such issues.

Not overriding the source type would mean that, if the data is ingested by uploading from a file on my computer, the search that Splunk Web offers for the newly uploaded data will actually find results!

0 Karma

adonio
Ultra Champion

not sure if your comment is an answer ...
can you elaborate on the problem you are trying to solve? what is it that you would like to achieve?

0 Karma

Graham_Hanningt
Builder

Hi adonio,

My question includes an answer, but, as I wrote, I don't like the technique it uses. My first comment after the question describes a workaround, rather than an answer: abandoning the idea of a granular sourcetype field, and instead relying on a combination of a fixed, generic sourcetype field in combination with a separate code field.

can you elaborate on the problem you are trying to solve?

I want to use a value in incoming JSON Lines data to set sourcetype per event. The value in the incoming data and the sourcetype are completely different.

what is it that you would like to achieve?

A more performant solution than the one I have now. Suppose I have 20 source types. Using my current technique, that means 20 separate stanzas in transforms.conf. I'm hoping for something more elegant and concise; and I'm hoping that this also means "more performant" (faster; less index-time processing for the transform).

I was hoping that PCRE2 replacement syntax might work; see my recent related comment on this question.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...