We see that the following log lines are always split into multiple events. I've tried multiple variations of LINE_BREAKER, BREAK_ONLY_BEFORE and MUST_NOT_BREAK_AFTER but nothing worked. Does anyone know how I could go about this?
-------------------------------------------------- FlowFile Properties Key: 'entryDate' Value: 'Wed Jan 04 16:14:58 UTC 2023' Key: 'lineageStartDate' Value: 'Wed Jan 04 16:14:58 UTC 2023' Key: 'fileSize' Value: '180' FlowFile Attribute Map Content --------------------------------------------------
Hi @dnavara,
the answer of @richgalloway is correct, remember to add
SHOULD_LINEMERGE = True
Ciao.
Giuseppe
At the risk of repeating one of the "multiple variations", have you tried
LINE_BREAKER = -{50}([\r\n]+)
Hi, thanks for the answer. I've tried this and it works when I upload test data manually but for some reason it doesn't work when I upload it to the HF. Is there any way these logs could already be broken up before they arrive? I've noticed that the logs have slightly different timestamps in the JSON format ie.
2023-01-05T08:56:20.916403009Z and 2023-01-05T08:56:20.916400199Z. I am not sure if this is because they arrived at different times or some processing time on HF.
If the data is processed by a HF then the LINE_BREAKER must be set on the HF.