Solved: Why is my LINE_BREAKER parameter not breaking prop...

andrewtrobec · ‎03-05-2024

Hello, I need help with perfecting a sourcetype that doesn't index my json files correctly when I am defining multiple capture groups within the LINE_BREAKER parameter.

I'm using this other questionto try to figure out how to make it work: https://community.splunk.com/t5/Getting-Data-In/How-to-handle-LINE-BREAKER-regex-for-multiple-captur...

In my case my json looks like this

[{"Field 1": "Value 1", "Field N": "Value N"}, {"Field 1": "Value 1", "Field N": "Value N"}, {"Field 1": "Value 1", "Field N": "Value N"}]

Initially I tried:

LINE_BREAKER = }(,\s){

Which split the events with the exception of the first and last records which were not indexed correctly due to the "[" or "]" characters leading and trailing the payload.

After many attempts I have been unable to make it work, but based on what I've read this seems to be the most intuitive solution for defining the capture groups:

LINE_BREAKER = ^([){|}(,\s){|}(])$

It doesn't work, but rather indexes the entire payload as one event, formatted correctly, but unusable.

Could somebody please suggest how to correctly define the LINE_BREAKER parameter for the sourcetype? Here is the full version I'm using:

[area:prd:json]
SHOULD_LINEMERGE = false
TRUNCATE = 8388608
TIME_PREFIX = \"Updated\sdate\"\:\s\"
TIME_FORMAT = %Y-%m-%d %H:%M:%S
TZ = Europe/Paris
MAX_TIMESTAMP_LOOKAHEAD = -1
KV_MODE = json
LINE_BREAKER = ^([){|}(,\s){|}(])$

Other resolutions to my problem are welcome as well!

Best regards,

Andrew

lucacaldiero · ‎03-06-2024

Dear splunk user,

using this sample data

[{"Field 859": "Value aaaaa", "Field 2": "Value bbbbb"}, {"Field 1": "Value ccccc", "Field 2": "Value ddddd"}, {"Field 1": "Value eeeee", "Field 2": "Value fffff"}]
[{"Field 759:" "Value ggggg", "Field 2": "Value hhhhh"}, {"Field 1": "Value iiiii", "Field 2": "Value jjjjj"}, {"Field 1": "Value kkkkk", "Field 2": "Value lllll"}]

with this props.conf

[trbndrw_temp]
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:\}(\s*,\s*)\{)|(\][\r\n]+\[)
TRANSFORMS-getrid = getridht

and this transforms.conf

[getridht]
INGEST_EVAL = _raw=replace(_raw, "(\[|\])","")

you may be able to achieve what you want

Happy splunking
Luca (aka "one DASH is always better")

View solution in original post

lucacaldiero · ‎03-06-2024

Dear splunk user,

using this sample data

[{"Field 859": "Value aaaaa", "Field 2": "Value bbbbb"}, {"Field 1": "Value ccccc", "Field 2": "Value ddddd"}, {"Field 1": "Value eeeee", "Field 2": "Value fffff"}]
[{"Field 759:" "Value ggggg", "Field 2": "Value hhhhh"}, {"Field 1": "Value iiiii", "Field 2": "Value jjjjj"}, {"Field 1": "Value kkkkk", "Field 2": "Value lllll"}]

with this props.conf

[trbndrw_temp]
DATETIME_CONFIG = CURRENT
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:\}(\s*,\s*)\{)|(\][\r\n]+\[)
TRANSFORMS-getrid = getridht

and this transforms.conf

[getridht]
INGEST_EVAL = _raw=replace(_raw, "(\[|\])","")

you may be able to achieve what you want

Happy splunking
Luca (aka "one DASH is always better")

andrewtrobec · ‎03-06-2024

Thanks Luca, this works! Appreciated!

isoutamo · ‎03-06-2024

Hi
Based on your TIME_PREFIX, your example is not complete sample! If you want that we help you, we really need the whole example json/file.
r. Ismo

andrewtrobec · ‎03-06-2024

Thank you @isoutamo for the response. Here is more accurate version of payload

[
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2024-02-28 12:52:18",
        "Extraction date": "2024-03-02 13:51:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2024-03-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    },
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2024-02-28 18:59:18",
        "Extraction date": "2024-03-05 16:31:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2024-03-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    },
    {
        "Assigned to": "Jones, Francis",
        "Cost": 3,
        "Created date": "2023-01-28 18:59:18",
        "Extraction date": "2023-02-05 16:31:00",
        "ID": 12345,
        "Initial Cost": 3,
        "Location": "Sites",
        "Path": "Sites\\FY1\\S3",
        "Priority": 1,
        "State": "In Progress",
        "Status Change date": "2023-02-05 16:33:23",
        "Tags": "Europe; Finance",
        "Title": "Ensure correct routing of orders",
        "Updated date": "2024-03-05 16:33:23",
        "Warranty": false,
        "Wave Quarter": "Q2 22",
        "Work Item Type": "Request"
    }
]

isoutamo · ‎03-06-2024

Thanks.

This seems to work

LINE_BREAKER = (\[[\s\n\r]*\{|\},[\s\n\r]+\{|\}[\s\n\r]*)

Why your regex doesn't work?

Splunk need only one capture group for line beak. You have three separate groups even you have try to make those selectable by |. You also need to escape some of those marks (like [{]} to recognise as a character). You can test this with https://regex101.com/r/IGQHd7/1

When I test these I use just regex101.com and/or Splunk GUI -> Settings -> Import Data -> Upload with example file on my own laptop/workstation/dev server. In that way it's easy to change those values and check how those are affecting.

You should also change

MAX_TIMESTAMP_LOOKAHEAD = 20

As you define TIMESTAMP_PREFIX there is no reason to use -1 as its lookahead value. Splunk starts to look it after defined prefix and as you can see correct timestamp is within 20 character after it.

Why you have set KV_MODE=json? As you have break this json into separate events, it's not anymore json as a format. Now it's just regular text based event.

andrewtrobec · ‎03-06-2024

Thank you for the feedback! I will take your suggestions into consideration!

Why is my LINE_BREAKER parameter not breaking properly with multiple capture groups?

JSON

props.conf

sourcetype

Join Us for Splunk University and Get Your Bootcamp Game On!

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

Announcing Scheduled Export GA for Dashboard Studio