Valid json gets truncated for some reason. Below is the props.conf file:
TRUNCATE = 0
KV_MODE = json
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\x7B
LINE_BREAKER = ([\r\n]+)(\x7B)
SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT
Any suggestions?
Everything gets truncated eventually, unless you use the (somewhat dangerous) TRUNCATE = 0
setting. Up your value for TRUNCATE
:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Propsconf
#******************************************************************************
# Line breaking
#******************************************************************************
# Use the following attributes to define the length of a line.
TRUNCATE = <non-negative integer>
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often a sign of
garbage data).
* Defaults to 10000 bytes.
You should be getting logs like this:
01-01-2020 18:40:37.625 +0000 WARN LineBreakingProcessor - Truncating line because limit of 10000 has been exceeded
Hi,
No , there isn't any log record about truncation due to length. The reason I set TRUNCATE = 0 was to eliminate any potential issue due to length. The intent is to set it to 30000 once I figure out why it gets truncated.
All error messages are like the one below but with different values:
ERROR JsonLineBreaker - JSON StreamId:18294845293918380307 had parsing error:Unexpected character while looking for value: 'r' - da
ta_source="/opt/splunk/vne2splunk/log.json", data_host="splmx1.sample.com", data_sourcetype="_json"
Some logs are parsed correctly like the one below:
{
"audit": "16489",
"hostScore": "0",
"name": "to8pt.sample.com",
"macAddress": "",
"os": "OS Undetermined",
"vulnerabilities": "1",
"netbiosName": "",
"application": {
"": "port - 5040",
"id: 6119 Application: DCE/MS RPC Endpoint Mapper Interface (TCP) description: DCE/MS RPC Endpoint Mapper Interface. parent: 165": "port - 135",
"id: 165 Service: DCE/MS RPC over TCP description: Microsoft RPC (Remote Procedure Call) over TCP is used by many services, including: DHCP Manager, DNS Administration, WINS Manager, Exchange Client/Server, Exchange Administrator and RPC. Third party applications, such as Symantec/Veritas BackupExec, may also make use of it. protocol: tcp transport: n/a parentid: n/a": "port - 135",
"id: 8037 Service: IPv4 Layer 4 description: Generic Layer 3 / Layer 4 RAW socket access. protocol: ip transport: n/a parentid: n/a": "port - 0"
},
"timeStamp": "2020-01-02 00:03:56",
"ipAddress": "172.16.25.32",
"id": "4128157",
"network": "INT - Transports"
}
The only difference is that the "application " object varies in length. One example I have is in Splunk gets truncated at 14,532 character, but the original json has 15,071 characters.
This leads me to believe that the issue is related to some character sequence but not sure which one.
Could you try to do a | eval eventlenght = len(_raw)
to see if Splunk truncates at the same position every time?
Hi rvaglid,
I ran the suggested eval on a few entries and the truncation position is not consistent:
11170
12231
13721
11331
Like I mentioned above it doesn't appear that the truncation occurs due to length but rather a character sequence.
how about this?
I've tried also a few alternate line breakers with no success:
LINE_BREAKER = ([\r\n]+)(\x7B(\x22))audit
LINE_BREAKER = ([\r\n]+)(\x7B)audit
LINE_BREAKER = ([\r\n]*)(?={)
no line breaker and INDEXED_EXTRACTIONS = json
Below is the beginning of the json (truncated here to keep the post clean)
{"audit":"16463","hostScore":"0","name":"to8pt.sample.com","macAddress":"","os":"OS Undetermined",
LINE_BREAKER = ([\r\n]*)(?=\{)
How about this?
you should do escape the character "{"
It didn't work either.
TRUNCATE = 0
KV_MODE = json
NO_BINARY_CHECK = true
LINE_BREAKER = ([\r\n]+)(?=\{)
SHOULD_LINEMERGE = false
DATETIME_CONFIG = CURRENT
How many logs are actually there and how many are trancated?
Also, is it LINE_BREAKER that doesn't work?
I think it's different from your question.