Splunk Search

How to exclude the events ?

zacksoft
Contributor

I have set of events from which there are a few events that starts with a three digit number (for example 200 23 45 dgdgdgd dhdhddh).
These are the corrupt data/events and I don't want to include them in my search.
How do I write a condition or regex or any other way, where it will exclude the events if it starts with a three digit number ?

Tags (1)
0 Karma

wrangler2x
Motivator

I agree with @niketnilay on dropping the bad data at the forwarder, but for the data you've already indexed you'll want a way to exclude it from your search.

It looks like your "good" data begins with an IP address. If that is so, then this will do the job:

... | regex _raw="^(\d{1,3}\.){3}\d{1,3}"

Where '...' is your base search.

mayurr98
Super Champion

hey @zacksoft

According to your commends and logic that you told what you can do is capture the event in field and exclude that field in a search.

<your_search> | rex field=_raw "^(?<CorruptData>[0-9]{3})" | search NOT CorruptData=*

let me know if this helps!

zacksoft
Contributor

I never thought it would be possible to do it without stopping the bad data from being indexed.
But your query seems like magic. And eliminates the need to wrestle with the Splunk admins to have us configure the indexer from indexing bad data. I'll try this and let you know. Thank you.

0 Karma

niketn
Legend

The difference is whether you want load once during index time or always during search time, provided the unwanted data is of no use.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

mayurr98
Super Champion

let me know and do not forget to accept/upvote if it works for you 😜

0 Karma

nickhills
Ultra Champion

Personally, I would use a rex to match the results which start with the bad format, and then amend my search to only include results which don't have that field, although if you do not even want to index the bad data you should use @niketnilay 's approach
Something like:

<your search>|rex field=_raw "^(?<corruptData>\d\d\d)"|search corruptData!=*

Updated to match your sample.

If my comment helps, please give it a thumbs up!

zacksoft
Contributor

The one which should be accepted are like
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

The ones which shouldn't be accepted be like
200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

I think if can identify the events that start with three digit numbers (like 200, 201, 401 etc.. ) and exclude that may work.

0 Karma

niketn
Legend

@zacksoft, seems like you are looking to drop the unwanted events from being indexed. For this you would need to pass your data through Heavy Forwarder which will have stanza for pushing unwanted events to nullQueue

Edit transforms.conf and add the following:

[YourSourceType]
TRANSFORMS-dropSourceTypeEvents=setnull

[setnull]
REGEX=^\d{3}
DEST_KEY=queue
FORMAT=nullQueue

Refer to documentation for details: http://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad#Filter_event_data_...

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

zacksoft
Contributor

Does it have to pass through a heavy forwarder ? An universal forwarder won't do ?
and we can set up conditions in the stanza that will eliminate bad data from being indexed .

Is my understanding correct ?

0 Karma

nickhills
Ultra Champion

@zacksoft - you could do this on your indexer if you don't have a HF.

If my comment helps, please give it a thumbs up!

zacksoft
Contributor

Thanks @nickhillscpl. I'll work with Splunk admins to configure the set up that @niketnilay suggested in the transforms.conf file.

0 Karma

nickhills
Ultra Champion

This is a good idea, if you don't need/want to index the corrupt data.

If my comment helps, please give it a thumbs up!
0 Karma

kamlesh_vaghela
SplunkTrust
SplunkTrust

Hi @zacksoft ,
Can you please share some sample events which needs to be include and which are not?

0 Karma

zacksoft
Contributor

It won't allow me to put any sample events.
So let me simplify the events and type.

The one which should be accepted are like
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

The ones which shouldn't be accepted be like
200 49 2 "htssphuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

I think if can identify the events that start with three digit numbers (like 200, 201, 401 etc.. ) and exclude that may work.

0 Karma

zacksoft
Contributor

Following is an example of an event that should be included.
40.118.209.1 0x735870x1 GG46989 [21/Dec/2014:00:00:00 -0500] "GET /rest/jphutenxporter/1.0/outputformatconfig/outputformatselected?_=1513833400783 HTTP/1.1" 200 49 2 "https://phuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

Following is an example that should NOT be included
200 49 2 "https://phuten.mayhem.com/browse/UOAI-1536" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 3.0.50727; .NET CLR 3.6.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3)" "38b0i3"

The logic that I could of think of is, if we see a three digit numbers at the beginning of an event (such as 200, 201, 402 etc..) then we ought to exclude it as they are corrupt data.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...