Splunk Search

How can I remove text from _raw if it appears as a field in Splunk

cpeteman
Contributor

I want to remove a string from _raw that appears as a field in Splunk say host. For example if I have the _raw message:

<ConMan> Console [hype33] log at 2013-08-15 00:00:00 PDT.
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]

The following search gets rid of date, time and any digits in _raw

|rex mode=sed "s/\d{1,}//g" |rex mode=sed "s/(Jan|January|Feb|Febuary|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December|Mon|Tue|Wed|Thu|Fri|Sat|Sun)//g" | rename _raw AS msgdigest

So the msgdigest then becomes:

<ConMan> Console [hype] log at -- :: PDT.
-- :: Setting hostname hype: [ OK ]

As my _raw message and say hype is a type of host I want to have

<ConMan> Console [] log at -- :: PDT.
-- :: Setting hostname: [ OK ]

The final goal here is to create a digest of _raw that has more detail than punct as I find that sometimes errors that are not actually similar have the same punct. So I am making hybrid of _raw and punct so to speak. I may try to make this available as an app in the long run.

Tags (4)

using
Explorer

I feel as though Splunk needs to have an easy way to identify values of a field inside of regex (added on to just perl re). This would make it easier to do a lot of things or at least give us more options.

bmacias84
Champion

You will need to use a transform.conf and props.conf. You you will do a capture and exclude the values you don't want and apply it at search time with REPORT. I didnt check my regex but this should give you some ideas.


#transform.conf
[data-anonymizer]
REGEX = (?m)^(.*\[\w[^\d])\d+(\].*)
FORMAT = $1$2
DEST_KEY = _raw


#props.conf
[yoursource]
REPORT-anonymizer=data-anonymizer

Hope this helps or gives you some ideas. Dont forget to vote and accept answers that help.

Cheers

bmacias84
Champion

I tried to create two capture groups .
$1 = Console [hype. $2 =] log at 2013-08-15 00:00:00 PDT.
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]
. The two capture groups exclude the 33 value. using format = $1$1 to replace _raw the event should contain the whole event excluding 33. I know this works during indexing phase and should doing search.

0 Karma

cpeteman
Contributor

Can you explain a little more what you are doing with the regex?

0 Karma

cpeteman
Contributor

Fear not, I'm in the process of getting access to those files so it may take a day or two.

0 Karma

lcrielaa
Communicator

If I assume correctly, you want to remove whatever's between the []. In your example of

Console [hype] log at -- :: PDT.
You want to get rid of the word hype.

I ran the following regex on for you on http://gskinner.com/RegExr/ on the above line

(?<=\[).*?(?=\])

This uses a positive lookbehind and a positive lookahead to search for the first [ and the first ] symbol and select everything in between. You could use this to do a find/replace and replace the text selected by the regex with nothing to get rid of it.

0 Karma

cpeteman
Contributor

I'm afraid this will only cover a few cases the [] do not always have anything to do with the field I also want to get rid of stuff like user names with aren't ever in brackets. Thanks for trying.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...