Splunk Search

Dedup in raw field

msmapper
Path Finder

When I write searches in Splunk 90% of them is based on data this is only available in the _raw field not one of the indexed fields like host or sourcetype. My goal would be to run a query that would dedup on this portion, 34e6a6-6d0-4626-a319ce-24e6a63, of the _raw field.

May 16 16:34:09 server1 16:34:09,376 WARN Servlet - TIME 34e6a6-6d0-4626-a319ce-24e6a63 63.216.54.213:64524 order=[abcdefg]

I can write regex for that value but when run the query below i still get duplicate values

index=application sourcetype=web | regex _raw = "\w\w\w\w\w\w-\w\w\w\w-\w\w\w\w\w\w-\w\w\w\w\w\w\w\" | dedup _raw

Can someone please let me know if what i am trying to do is possible and point me to the correct path?

Thanks in advance!!!

0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id

View solution in original post

msmapper
Path Finder

Damien,

Yes, the missing escape character must be due to a formatting issue in splunkbase because I am definitely using it. Also, the overall rex statement works perfectly fine as the format is consistent across this particular log event, its just when I add the "| dedup hex_id" to the query I get zero results. I went and removed the "TIME\s" from the query and everything worked correctly.

Thank you very much for your help.

Jen

0 Karma

msmapper
Path Finder

Damien,

Thank you for your response but unfortunately it didn't work. When I run the query
index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" ,I get about 200,000 results returned.

When I try and run index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id, I get 0 results back.

Thoughts?

0 Karma

Damien_Dallimor
Ultra Champion

Your copy/paste above doesn't match my post....the escape character before the "s" and after "TIME" is missing, maybe that's just a splunkbase formatting quirk.

Furthermore , refer to my original post , "Note : assumes that the format of the hex ID is consistent across different log events."...you only supplied 1 sample log event to work with, so if the pattern of the hex id is variable , then the regex pattern will need to be altered.

0 Karma

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...