I have applied regex in the heavy forwarders as below. But this works only for few events and a lot of events are not getting parsed with the regex in BREAK_ONLY_BEFORE.
pulldown_type = 1
SEDCMD-backslash=s/\//g
TRUNCATE = 0
BREAK_ONLY_BEFORE = {\”name\”
DATETIME_CONFIG = CURRENT
INDEXED_EXTRACTIONS = json
KV_MODE = json
category = Structured
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
Sample logs as below.
{\"name\":\"\",\"\":,\"severity\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"\":\"UNKNOWN CORRELATION\",\"userId\":\"UNKNOWN USER\",\"moduleName\":\"\",\"\":\"a\",\"client\":\"AgentDesktop\",\"type\":\"application\",\"msg\":\"\",\"\":\"\"}{\"name\":\"\",\"level\":30,\"\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"clientCorrelationId\":\"\",\"userId\":\"UNKNOWN
For some events the same stanza in heavy forwarder works, but for others, it does not work. Can someone let me know what could be wrong?
Hi,
Your SHOULD_LINEMERGE value must be true. And I made small adjustment to your regex. Try below,
props.conf:
BREAK_ONLY_BEFORE = \{\W+name
SHOULD_LINEMERGE = true
Thanks! But how my stanza worked for one event and it is not working for another event. Why it was not working for all the events with the same pattern? Also in the regex you provided, I want to break only at name and at the braces before that.Will this break the event at the field name?
I am not sure how it worked for the first event. Your regex did not match the event. Tested here. The backslash before quotes must be escaped in order to match \"
.
I updated my regex above. This will look for {
before name
Hi Surya
Thanks! I will try to implement it ! Also could you let me know what regex can be applied to the below log sample to break at the name field?
{\"name\":\"\",\"level\":,\"severity\":\"info\",\"time
If events are multi-line, then try (?m)\{\W+name
(?m)
- multi-line modifier
\{
- This will look for { literally.
\W+
- This will match any number of non-word characters. If you're sure about the number of characters between {
and name
, then make use of quantifiers, for example, \W{1,3}
- this will look for minimum 1 and max 3 characters instead of looking for 1 and unlimited.
name
- This will look for name
literally case-sensitive.
Please refer to this page for more details.
If events are not multi-line:
I would suggest using LINE_BREAKER
instead of BREAK_ONLY_BEFORE
because, LINE_BREAKER will improve processing speed. If you would like to use LINE_BREAKER, then below are the configs,
LINE_BREAKER = ([\r\n]+)\{\W+name
SHOULD_LINEMERGE = false
Hi Surya
We tried most of all the suggestions that you provided but nothing looks to be working.Only few events are being parsed and most of the events are not.But the SED command that I am applying works for all the events.The Regex is not working for all the events.I have not used the LINe BREAKER though.Will it work ?
Okay, I see what you're doing. I will provide you two set of configs, one for multi line events; and another for single line events. Please apply these configs per your use case.
Multi line events (records with name
starting in same line):
{\"name\":\"\",\"\":,\"severity\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"\":\"UNKNOWN CORRELATION\",\"userId\":\"UNKNOWN USER\",\"moduleName\":\"\",\"\":\"a\",\"client\":\"AgentDesktop\",\"type\":\"application\",\"msg\":\"\",\"\":\"\"}{\"name\":\"\",\"level\":30,\"\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"clientCorrelationId\":\"\",\"userId\":\"UNKNOWN
props.conf:
[your_sourcetype]
BREAK_ONLY_BEFORE = (?m)\{\W*name
SHOULD_LINEMERGE = true
SEDCMD-backslash=s/\\//g
DATETIME_CONFIG = CURRENT
KV_MODE = json
category = Structured
NO_BINARY_CHECK = true
TRUNCATE = 0
Single line events (records with name
starting in new line):
{\"name\":\"\",\"\":,\"severity\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"\":\"UNKNOWN CORRELATION\",\"userId\":\"UNKNOWN USER\",\"moduleName\":\"\",\"\":\"a\",\"client\":\"AgentDesktop\",\"type\":\"application\",\"msg\":\"\",\"\":\"\"}
{\"name\":\"\",\"level\":30,\"\":\"info\",\"time\":,\"host\":\"\",\"hostname\":\"\",\"\":\"\",\"clientCorrelationId\":\"\",\"userId\":\"UNKNOWN
props.conf:
[your_sourcetype]
LINE_BREAKER = ([\r\n]+)\{\W*name
SHOULD_LINEMERGE = false
SEDCMD-backslash=s/\\//g
DATETIME_CONFIG = CURRENT
KV_MODE = json
category = Structured
NO_BINARY_CHECK = true
TRUNCATE = 0
You can test regex for both BREAK_ONLY_BEFORE
and LINE_BREAKER
with their respective data samples here.
Also, in your configurations, you're using INDEXED_EXTRACTIONS
and KV_MODE
to extract json fields. This is not suggestible as this will extract fields twice, resulting in duplicate field values. Please have a look at below links and use any one setting which suits your need.
https://answers.splunk.com/answers/556279/why-would-indexed-extractionsjson-in-propsconf-be.html
https://www.hurricanelabs.com/blog/splunk-case-study-indexed-extractions-vs-search-time-extractions
Hi Surya- The solution thatyou provided yesterday works only for the events starting with new line.For the events are merged in a single line,it does not work.Will the above stanza work for thos merged events within a single line too?
Yes. Use the 1st set of configs. I am not sure why it did not work the first time. Can you paste your full props.conf here which you're using right now. Please use "code generator" (the icon with 101010) for pasting content.
[empath_app_log]
pulldown_type = 1
SEDCMD-backslash=s/\\//g
TRUNCATE = 0
BREAK_ONLY_BEFORE = \{\W+name
DATETIME_CONFIG = CURRENT
INDEXED_EXTRACTIONS = json
KV_MODE = json
category = Structured
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
This is what we deployed last night and only the events starting with newline is being parsed while the events merged together in single line is not being parsed.
{"name":"utterance.service logger","level":30,"severity":"info","time":"host":"","hostname":"","category":"application","clientCorrelationId":"","userId":"","moduleName":"DisplayUtterancesFsModule","source":"angular","client":"AgentDesktop","type":"application","msg":"utterance does not exist","logId":""}{"name":"utterance.service logger","level":30,"severity":"info","time":,"host":"","hostname":"","category":"application","clientCorrelationId":"","userId":"","moduleName":"","source":"angular","client":"AgentDesktop","type":"application","msg":"utterance does not exist","logId":""}
Above the sample log that is not being parsed .I pulled it from the splunk UI
Thanks for the information. Please add (?m)
- multi-line modifier before \{\W+name
. This will make splunk to look at each line for {"name
string.
Oops! I applied that as well.Below is the one that is in the server and still not working as I expected.
[empath_app_log]
pulldown_type = 1
SEDCMD-backslash=s/\//g
TRUNCATE = 0
BREAK_ONLY_BEFORE = (?m){\W+name
DATETIME_CONFIG = CURRENT
INDEXED_EXTRACTIONS = json
KV_MODE = json
category = Structured
SHOULD_LINEMERGE = true
NO_BINARY_CHECK = true
Hmm. Can you check if any other setting is taking precedence by running this command splunk btool props list --debug | grep 'empath_app_log'
Do you mind walking me through your architecture. Data flow is from UF --> HF --> Indexer?
The Data flow is from Deployment server to the heavy forwarder to the indexers.
Are you collecting logs from deployment server? In that case, please place the same props.conf along with your inputs.conf on DS as well. What was the output of btool command. Did you notice any conflicts?
I am unable to run that command.I dont have that previlege