I wanted to extract the first word that comes after the timestamp.
The time stamps are of varied formats
example event1 :
2019-02-05 11:89:17,642 EST BROCOD bla bla bla ......
example event2 :
2019-02-05 19:35:18,642 MARC bla bla bla........
I wanted to parse BROCOD and MARC
I tried the following....it should work..but I'm not sure why it is not showing me any result
| rex "^(?:[^ \n]* ){3}(?P<level>\w+)" | table level
Hey zacksoft,
this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.
https://regex101.com/r/n1RYOu/2
So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.
So please give it a careful look and ask me questions about it if you have any.
Regards,
pyro_wood
I tried below and worked for me
rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"
Example-
|makeresults| eval x="2019-02-05 11:89:17,642 EST BROCOD bla bla bla" |appendpipe[|eval x="2019-02-05 19:35:18,642 MARC bla bla bla"]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)"
Thanks Vijeta....
I am wondering how to implement it....
Instead of .......|appendpipe[|eval x="2019-02-05 19: ...........
I replaced with ...|appendpipe[|eval x=_raw ...........
so it will scan it all events ...but it gives many errors
index=myIndex host=myhost sourcetype="my.source.type" |makeresults| eval x=_raw |appendpipe[|eval x=_raw]| rex field=x "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
@zacksoft - did you try the below
You need not use makeresults, it was just for creating sample events for me. Your query can be-
index=myIndex host=myhost sourcetype="my.source.type" |rex field=_raw "\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3}\s{0,1}\w{0,3}\s(?<level>\w+)" | table level
Hey zacksoft,
this one is a bit complicated as you can never be sure if ther will be an abbreviated timezone or not.
https://regex101.com/r/n1RYOu/2
So I found this solution for you, which might look a bit convuluted at first, but basically matches all the possible time-zone-abbreviations we have at the moment. And only, if they are there.
So please give it a careful look and ask me questions about it if you have any.
Regards,
pyro_wood
Thanks @horsefez
Just to confirm this is the regex right ? I am a bit new to this regex arena !!
index=DEMOhost=anything sourcetype="something.something"
rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?\w+)"
| table match
If, yes I tried this..but it yielded no result !!! 😞
Hi @zacksoft,
try this one and tell me if it works.
index=DEMO host=anything sourcetype=something
| rex "^\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s(?:\b(?:ACDT|ACST|ACT|ACT|ACWST|ADT|AEDT|AEST|AFT|AKDT|AKST|AMST|AMT|AMT|ART|AST|AST|AWST|AZOST|AZOT|AZT|BDT|BIOT|BIT|BOT|BRST|BRT|BST|BST|BST|BTT|CAT|CCT|CDT|CDT|CEST|CET|CHADT|CHAST|CHOT|CHOST|CHST|CHUT|CIST|CIT|CKT|CLST|CLT|COST|COT|CST|CST|CST|CT|CVT|CWST|CXT|DAVT|DDUT|DFT|EASST|EAST|EAT|ECT|ECT|EDT|EEST|EET|EGST|EGT|EIT|EST|FET|FJT|FKST|FKT|FNT|GALT|GAMT|GET|GFT|GILT|GIT|GMT|GST|GST|GYT|HDT|HAEC|HST|HKT|HMT|HOVST|HOVT|ICT|IDLW|IDT|IOT|IRDT|IRKT|IRST|IST|IST|IST|JST|KALT|KGT|KOST|KRAT|KST|LHST|LHST|LINT|MAGT|MART|MAWT|MDT|MET|MEST|MHT|MIST|MIT|MMT|MSK|MST|MST|MUT|MVT|MYT|NCT|NDT|NFT|NPT|NST|NT|NUT|NZDT|NZST|OMST|ORAT|PDT|PET|PETT|PGT|PHOT|PHT|PKT|PMDT|PMST|PONT|PST|PST|PYST|PYT|RET|ROTT|SAKT|SAMT|SAST|SBT|SCT|SDT|SGT|SLST|SRET|SRT|SST|SST|SYOT|TAHT|THA|TFT|TJT|TKT|TLT|TMT|TRT|TOT|TVT|ULAST|ULAT|UTC|UYST|UYT|UZT|VET|VLAT|VOLT|VOST|VUT|WAKT|WAST|WAT|WEST|WET|WIT|WST|YAKT|YEKT)\b\s*)?(?<level>\w+)"
@pyro_wood - This is the most insane looking query. But it is awesome.. it works perfectly ......
You're a genius. Thank you very much.
@zacksoft,
I agree that it looks complicated at first and I'm glad that it works out for you.
But it's not so complicated.
I will explain to you why it isn't as complicated as it might look.
^
this is called an anchor, and points to the start of the line (will always be there)
\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2},\d+\s*
this traverses over the date and timefields (will always be there)
(?:\b(?:ACDT|ACST|ACT|ACWST...|BOT|...|WST|YAKT|YEKT)\b\s*)?
this will look for a valid timezone abbreviation. A list of all valid timezone abbreviations I found on the web.
It basically is a OR
-list. If it doesn't find ACDT, it will look if it finds ACST, if not it looks if it finds ACT and so on. The very last ?
question mark makes the entire statement that is encased in paranteshis optional. It means, that the timezone might be there or not. (optional)
(?<level>\w+)
regardless of the existence of the optional timezone field, the field that matches your text comes afterwards (will always be there)
You might have notice the \b
in the regex. \b
marks a word-boundary. Long story short it makes sure that the timezone matching instruction doesn't match words like for example "ACTION", "BOTTOM", "PETS", "PHOTO" or "WESTWARDS".
Hope this helps a bit.
Regards,
pyro_wood
Thanks for explaining each step. Now I understand.
You can check this out - https://regex101.com/r/cQF8aS/1
You need something like
^.*,\d+\s+(?:EST)?\s?(?\w+)
Thanks Lakshman.
When I try this it says "unrecognized character after (? or (?-"
Also what is the field name where the extraction is getting stored at?