Splunk Search

How can I remove text from _raw if it appears as a field in Splunk

cpeteman
Contributor

I want to remove a string from _raw that appears as a field in Splunk say host. For example if I have the _raw message:

<ConMan> Console [hype33] log at 2013-08-15 00:00:00 PDT.
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]

The following search gets rid of date, time and any digits in _raw

|rex mode=sed "s/\d{1,}//g" |rex mode=sed "s/(Jan|January|Feb|Febuary|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December|Mon|Tue|Wed|Thu|Fri|Sat|Sun)//g" | rename _raw AS msgdigest

So the msgdigest then becomes:

<ConMan> Console [hype] log at -- :: PDT.
-- :: Setting hostname hype: [ OK ]

As my _raw message and say hype is a type of host I want to have

<ConMan> Console [] log at -- :: PDT.
-- :: Setting hostname: [ OK ]

The final goal here is to create a digest of _raw that has more detail than punct as I find that sometimes errors that are not actually similar have the same punct. So I am making hybrid of _raw and punct so to speak. I may try to make this available as an app in the long run.

Tags (4)

using
Explorer

I feel as though Splunk needs to have an easy way to identify values of a field inside of regex (added on to just perl re). This would make it easier to do a lot of things or at least give us more options.

bmacias84
Champion

You will need to use a transform.conf and props.conf. You you will do a capture and exclude the values you don't want and apply it at search time with REPORT. I didnt check my regex but this should give you some ideas.


#transform.conf
[data-anonymizer]
REGEX = (?m)^(.*\[\w[^\d])\d+(\].*)
FORMAT = $1$2
DEST_KEY = _raw


#props.conf
[yoursource]
REPORT-anonymizer=data-anonymizer

Hope this helps or gives you some ideas. Dont forget to vote and accept answers that help.

Cheers

bmacias84
Champion

I tried to create two capture groups .
$1 = Console [hype. $2 =] log at 2013-08-15 00:00:00 PDT.
2013-08-15 14:25:48 Setting hostname hype362: [ OK ]
. The two capture groups exclude the 33 value. using format = $1$1 to replace _raw the event should contain the whole event excluding 33. I know this works during indexing phase and should doing search.

0 Karma

cpeteman
Contributor

Can you explain a little more what you are doing with the regex?

0 Karma

cpeteman
Contributor

Fear not, I'm in the process of getting access to those files so it may take a day or two.

0 Karma

lcrielaa
Communicator

If I assume correctly, you want to remove whatever's between the []. In your example of

Console [hype] log at -- :: PDT.
You want to get rid of the word hype.

I ran the following regex on for you on http://gskinner.com/RegExr/ on the above line

(?<=\[).*?(?=\])

This uses a positive lookbehind and a positive lookahead to search for the first [ and the first ] symbol and select everything in between. You could use this to do a find/replace and replace the text selected by the regex with nothing to get rid of it.

0 Karma

cpeteman
Contributor

I'm afraid this will only cover a few cases the [] do not always have anything to do with the field I also want to get rid of stuff like user names with aren't ever in brackets. Thanks for trying.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...