Splunk Search

Is there a REGEX character limit on field replacement in transforms.conf?

heath
Path Finder

I have data that is in json format but I only want to keep the value of the MESSAGE field from it. I created a transform to extract the value of MESSAGE and put it in _raw. This works fine until the value is over about 680 characters, then it doesn't work. I can't figure out what limit I'm hitting so I can increase it, any idea?

transforms.conf:

[replace-raw-with-message]
DEST_KEY = _raw
REGEX = MESSAGE": \"((\\.|[^\"])*)\"
FORMAT = $1

Our data looks like this. If the value in MESSAGE is over about 680 characters the replacement no longer works.
{ "MESSAGE": "the value" }

0 Karma

krizi
Loves-to-Learn

A long time overdue, but just in case anyone stumbles upon this issue as well. Try to add LOOKAHEAD with larger number (e.g. 10000) to you your transformation stanza. You can find out more in transformations.conf documentation.

 

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Updated - never mind, found it.

This part of your regex (\\.|[^\"])*, after the escaped slash is resolved, reads in English

" a group made up of any number from 0 to infinity of either an actual period OR something that isn't a quote."

Well, first, since an actual period is not a quote, the OR a redundant option that is costing extra work. Since you put the period test first, any time that there's an actual period, then splunk has to keep track of that point that it has an option it's going to need to backtrack to.

Try this -

 MESSAGE": \"([^\"]*)\"

Chances are pretty good that you are looking at a catastrophic backtracking failure when the machine runs out of memory to figure out your data.

If you post a nonconfidential example of the data, then we may be able to revise your regex to avoid the problem.

heath
Path Finder

The value of message can contain json data with escaped double quotes. The original regex will include them. For example:
"MESSAGES": "{ \"thefield\":\"thevalue\"}"

The \. was intended to catch all escaped characters. I tried changing it to \\" to only catch escaped double quotes but ran into the same issue. Like this:
REGEX = MESSAGE": \"((\\"|[^\"])*)\"

Any other thoughts? Can I allocate more memory somehow?

0 Karma

heath
Path Finder

You can put anything in there. I was able to reproduce it on a test instance by putting 700 numbers in as the data, like this:

{ "MESSAGE": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" }

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...