Splunk Search

With regex, can you help us extract the first word that comes after the timestamp?

zacksoft
Contributor

I wanted to extract the first word that comes after the timestamp.

The time stamps are of varied formats

example event1 :

2019-02-05 11:89:17,642 EST BROCOD bla bla bla ......

example event2 :

2019-02-05 19:35:18,642 MARC bla bla bla........

I wanted to parse BROCOD and MARC

I tried the following....it should work..but I'm not sure why it is not showing me any result

| rex "^(?:[^ \n]* ){3}(?P<level>\w+)" | table  level 
0 Karma
1 Solution

horsefez
SplunkTrust
SplunkTrust

Hey zacksoft,

this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.

https://regex101.com/r/n1RYOu/2

So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.

So please give it a careful look and ask me questions about it if you have any.

Regards,
pyro_wood

View solution in original post

Vijeta
Influencer

I tried below and worked for me

rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"

Example-

|makeresults| eval x="2019-02-05 11:89:17,642 EST BROCOD bla bla bla" |appendpipe[|eval x="2019-02-05 19:35:18,642 MARC bla bla bla"]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"
0 Karma

zacksoft
Contributor

Thanks Vijeta....
I am wondering how to implement it....
Instead of .......|appendpipe[|eval x="2019-02-05 19: ...........
I replaced with ...|appendpipe[|eval x=_raw ...........
so it will scan it all events ...but it gives many errors

index=myIndex host=myhost sourcetype="my.source.type"  |makeresults| eval x=_raw |appendpipe[|eval x=_raw]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
0 Karma

Vijeta
Influencer

@zacksoft - did you try the below

You need not use makeresults, it was just for creating sample events for me. Your query can be-

index=myIndex host=myhost sourcetype="my.source.type"  |rex field=_raw "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
0 Karma

horsefez
SplunkTrust
SplunkTrust

Hey zacksoft,

this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.

https://regex101.com/r/n1RYOu/2

So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.

So please give it a careful look and ask me questions about it if you have any.

Regards,
pyro_wood

zacksoft
Contributor

Thanks @horsefez

Just to confirm this is the regex right ? I am a bit new to this regex arena !!

index=DEMOhost=anything sourcetype="something.something"
rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?\w+)"
| table match

If, yes I tried this..but it yielded no result !!! 😞

0 Karma

horsefez
SplunkTrust
SplunkTrust

Hi @zacksoft,

try this one and tell me if it works.

index=DEMO host=anything sourcetype=something 
| rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?<level>\w+)"
0 Karma

zacksoft
Contributor

@pyro_wood - This is the most insane looking query. But it is awesome.. it works perfectly ......
You're a genius. Thank you very much.

0 Karma

horsefez
SplunkTrust
SplunkTrust

@zacksoft,

I agree that it looks complicated at first and I'm glad that it works out for you.

But it's not so complicated.
I will explain to you why it isn't as complicated as it might look.
^ this is called an anchor, and points to the start of the line (will always be there)
\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s* this traverses over the date and timefields (will always be there)
(?:\b(?:ACDT|ACST|ACT|ACWST...|BOT|...|WST|YAKT|YEKT)\b\s*)? this will look for a valid timezone abbreviation. A list of all valid timezone abbreviations I found on the web.
It basically is a OR-list. If it doesn't find ACDT, it will look if it finds ACST, if not it looks if it finds ACT and so on. The very last ? question mark makes the entire statement that is encased in paranteshis optional. It means, that the timezone might be there or not. (optional)
(?<level>\w+) regardless of the existence of the optional timezone field, the field that matches your text comes afterwards (will always be there)

You might have notice the \b in the regex. \b marks a word-boundary. Long story short it makes sure that the timezone matching instruction doesn't match words like for example "ACTION", "BOTTOM", "PETS", "PHOTO" or "WESTWARDS".

Hope this helps a bit.
Regards,
pyro_wood

0 Karma

zacksoft
Contributor

Thanks for explaining each step. Now I understand.

0 Karma

lakshman239
SplunkTrust
SplunkTrust

You can check this out - https://regex101.com/r/cQF8aS/1
You need something like

^.*,\d+\s+(?:EST)?\s?(?\w+)

0 Karma

zacksoft
Contributor

Thanks Lakshman.
When I try this it says "unrecognized character after (? or (?-"
Also what is the field name where the extraction is getting stored at?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...