I have a fairly hefty chunk of JSON from RabbitMQ REST.
In my props I have:
[json_no_timestamp]
TRUNCATE = 500000
In transforms, I have:
[CFBPFCCmessages]
REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFBPFCCmessages::$2
[CFBPFfailed]
REGEX = (?U)()"messages":.+"messages":(?P<CFBPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPFfailed::$2
[CFBPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":(?P<CFBPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFBPFmobile::$2
[CFBPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFBPFonboard::$2
[CFBPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFBPFticketoffice::$2
[CFBPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFBPFtvm::$2
[CFBPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPFunknown::$2
[CFBPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFBPFweb::$2
[CFBPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFBPMemail::$2
[CFBPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPMfailed::$2
[CFBPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFBPMsms::$2
[CFBPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFBPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFBPMunknown::$2
[CFGPFCCmessages]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFGPFCCmessages::$2
[CFGPFfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPFfailed::$2
[CFGPFmobile]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFGPFmobile::$2
[CFGPFonboard]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFonboard>\d+),"messages
WRITE_META = true
FORMAT = CFGPFonboard::$2
[CFGPFticketoffice]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFticketoffice>\d+),"messages
WRITE_META = true
FORMAT = CFGPFticketoffice::$2
[CFGPFtvm]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFtvm>\d+),"messages
WRITE_META = true
FORMAT = CFGPFtvm::$2
[CFGPFunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPFunknown::$2
[CFGPFweb]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPFweb>\d+),"messages
WRITE_META = true
FORMAT = CFGPFweb::$2
[CFGPMemail]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMemail>\d+),"messages
WRITE_META = true
FORMAT = CFGPMemail::$2
[CFGPMfailed]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMfailed>\d+),"messages
WRITE_META = true
FORMAT = CFGPMfailed::$2
[CFGPMsms]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMsms>\d+),"messages
WRITE_META = true
FORMAT = CFGPMsms::$2
[CFGPMunknown]
REGEX = (?U)()"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":.+"messages":(?P<CFGPMunknown>\d+),"messages
WRITE_META = true
FORMAT = CFGPMunknown::$2
When indexing, I only get the first 3 fields, the other fields beyond CFBPFmobile are not indexed.
I was considering MATCH_LIMIT, will this work?
Hi all, solved this with a major deep dive.
The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.
The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.
Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2
And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.
Hi all, solved this with a major deep dive.
The RegEx provided is a great improvement, thanks for that, it still only works with the ungreedy prefix as well.
The missing point was LOOKAHEAD - default is 4KB and this is the directive to regex to not go beyond that value by default.
Now, each stanza looks like this:
[CFGPFweb]
REGEX = (?U)"messages":(.+"messages":){19}(?P\d+),"messages
LOOKAHEAD = 65535
WRITE_META = true
FORMAT = CFGPFweb::$2
And it works, I had to update every definition to 64KB, not sure how much overhead but I'm only getting 1 JSON msg per/min.
You might want to try this to make your regex a bit cleaner:
[CFBPFCCmessages]
REGEX = (?U)()"messages":(?P<CFBPFCCmessages>\d+)
WRITE_META = true
FORMAT = CFBPFCCmessages::$2
[CFBPFfailed]
REGEX = (?U)()"messages":(.+"messages":){1}(?P<CFBPFfailed>\d+),"messages
WRITE_META = true
FORMAT = CFBPFfailed::$2
[CFBPFmobile]
REGEX = (?U)()"messages":(.+"messages":){2}(?P<CFBPFmobile>\d+),"messages
WRITE_META = true
FORMAT = CFBPFmobile::$2
...
...
Im not familiar with RabbitMQ, but its possible that because you are not explicitly specifying a string start with ^
you could be getting inconsistent matches.
What is in your event before the first "messages" entry?
The first instance of a match failure is reportedly at bytes 4959-4960, this is for the CFBPFonboard field, and the rest after that fail as well.
The performance stats for regex101 say that this is 39232 steps and takes ~73 ms.
Is this operation too expensive for the regex engine?
Thanks
Hi, I have tried this and got exactly the same result, I believe this may have something to do with truncation of the event or some sort of limitation with the regex input buffer - although I have set truncate = 500000, this may not be respected from a regex point of view?
Before and up to the first occurrence:
[{"memory":21904,"reductions":413518,"reductions_details":{"rate":0.0},"messages":0,"messages_details":
It all works on regex101.com using PCRE, but only when I specify the ungreedy option, hence the (?U).
I will try what you have done, however when I did use the {n} regex function on regex101 is just went mad and started selecting 1, 2, 3 characters then nothing, as if it was selecting the amount in characters and not the occurrence.
I can't post the JSON here it's too much, it is very uniform and strongly formatted with no line breaks etc...