Splunk Search

Dedup in raw field

msmapper
Path Finder

When I write searches in Splunk 90% of them is based on data this is only available in the _raw field not one of the indexed fields like host or sourcetype. My goal would be to run a query that would dedup on this portion, 34e6a6-6d0-4626-a319ce-24e6a63, of the _raw field.

May 16 16:34:09 server1 16:34:09,376 WARN Servlet - TIME 34e6a6-6d0-4626-a319ce-24e6a63 63.216.54.213:64524 order=[abcdefg]

I can write regex for that value but when run the query below i still get duplicate values

index=application sourcetype=web | regex _raw = "\w\w\w\w\w\w-\w\w\w\w-\w\w\w\w\w\w-\w\w\w\w\w\w\w\" | dedup _raw

Can someone please let me know if what i am trying to do is possible and point me to the correct path?

Thanks in advance!!!

0 Karma
1 Solution

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id

View solution in original post

msmapper
Path Finder

Damien,

Yes, the missing escape character must be due to a formatting issue in splunkbase because I am definitely using it. Also, the overall rex statement works perfectly fine as the format is consistent across this particular log event, its just when I add the "| dedup hex_id" to the query I get zero results. I went and removed the "TIME\s" from the query and everything worked correctly.

Thank you very much for your help.

Jen

0 Karma

msmapper
Path Finder

Damien,

Thank you for your response but unfortunately it didn't work. When I run the query
index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" ,I get about 200,000 results returned.

When I try and run index=application sourcetype=web | rex field=_raw "TIME\s(?\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id, I get 0 results back.

Thoughts?

0 Karma

Damien_Dallimor
Ultra Champion

Your copy/paste above doesn't match my post....the escape character before the "s" and after "TIME" is missing, maybe that's just a splunkbase formatting quirk.

Furthermore , refer to my original post , "Note : assumes that the format of the hex ID is consistent across different log events."...you only supplied 1 sample log event to work with, so if the pattern of the hex id is variable , then the regex pattern will need to be altered.

0 Karma

Damien_Dallimor
Ultra Champion

For the supplied log example, this would work :

Note : assumes that the format of the hex ID is consistent across different log events.

index=application sourcetype=web | rex field=_raw "TIME\s(?<hex_id>\w{6}-\w{3}-\w{4}-\w{6}-\w{7})" | dedup hex_id
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...