I have HEC messages that are indexed with the sourcetype _json
. This is a build in Splunk source obviously and has the following configuration:
[_json]
pulldown_type = true
INDEXED_EXTRACTIONS = json
KV_MODE = none
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/
I have a problem however with the length of the indexed fields, they are truncated to 1000 characters. I can't seem to figure out what field I should set to increase that limit.
To give a bit more context, the HEC messages that I receive are roughly structured as follows:
{
"id": "35298092067921924966859073695563957796481621929900441603",
"level": "INFO",
"message": "2020-02-27T16:33:10.666Z e18c650c-7d2d-4acc-bf9c-bfbb1fd0cec4 INFO {\"message\":\"Error while ... \"}"
}
So we actually have an extract field called message
(and id
and level
) etc, but that field can be rather long and is truncated at 1000 characters.
I've try to find this in the limits.conf
documentation, but I cannot find a definitive value there. Can somebody help me out?
Hi @aukevanleeuwen , wondering if you were ever able to get indexed field extractions >1000 characters to work?
From here:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf
Maybe this?
[kv]
avg_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that the average (over search
results) execution time of a key-value pair extractor will be allowed to take
before warning. Once the average becomes larger than this amount of time a
warning will be issued
* Default: 500 (.5 seconds)
limit = <integer>
* The maximum number of fields that an automatic key-value field extraction
(auto kv) can generate at search time.
* The summary fields 'host', 'index', 'source', 'sourcetype', 'eventtype',
'linecount', 'splunk_server', and 'splunk_server_group' do not count against
this limit and will always be returned.
* Increase this setting if, for example, you have data with a large
number of columns and want to ensure that searches display all fields extracted
from an automatic key-value field (auto kv) configuration.
* Set this value to 0 if you do not want to limit the number of fields
that can be extracted at index time and search time.
* Default: 100
indexed_kv_limit = <integer>
* The maximum number of fields that can be extracted at index time from a data source.
* Fields that can be extracted at index time include default fields, custom fields,
and structured data header fields.
* The summary fields 'host', 'index', 'source', 'sourcetype', 'eventtype', 'linecount',
'splunk_server', and 'splunk_server_group' do not count against this limit and are
always returned.
* Increase this setting if, for example, you have indexed data with a large
number of columns and want to ensure that searches display all fields from
the data.
* Set this value to 0 if you do not want to limit the number of fields
that can be extracted at index time.
* Default: 200
maxchars = <integer>
* Truncate _raw to this size and then do auto KV.
* Default: 10240 characters
maxcols = <integer>
* When non-zero, the point at which kv should stop creating new fields.
* Default: 512
max_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that a key-value pair extractor
will be allowed to take before warning. If the extractor exceeds this
execution time on any event a warning will be issued
* Default: 1000 (1 second)
Check with below configurations:
props.conf
[_json]
TRUNCATE = 0
limits.conf
[kv]
maxchars = 1024000
Thanks for the answer. The TRUNCATE
value I've already set to a value large enough for a single line. the maxchars
value is currently set to a bigger value than 1000
so this is also unlikely.
Hmm... I'm trying to reproduce using a plain splunk/splunk:latest
docker container, but there it seems to work. I.e. Values > 1000 get the correct field value (non-truncated). Maybe it's something on the Splunk Cloud side?
@aukevanleeuwen try this search. It will look for "truncating" messages in the _internal splunk index and let you know the largest it has seen and give you a recommended value to put in props.
index="_internal" sourcetype=splunkd source="*splunkd.log" log_level="WARN" "Truncating"
| rex "line length >= (?<line_length>\d+)"
| stats values(host) as host values(data_host) as data_host count last(_raw) as common_events last(_time) as _time max(line_length) as max_line_length by data_sourcetype log_level
| table _time host data_host data_sourcetype log_level max_line_length count common_events
| rename data_sourcetype as sourcetype
| eval number=max_line_length
| eval recommeneded_truncate=max_line_length+100000
| eval recommeneded_truncate=recommeneded_truncate-(recommeneded_truncate%100000)
| eval recommended_config="# props.conf
["+sourcetype+"]
TRUNCATE = "+recommeneded_truncate
| table _time host data_host sourcetype log_level max_line_length recommeneded_truncate recommended_config count common_events
| sort -count
Credit goes to @rob_jordan as I found what he posted in this Answer: https://answers.splunk.com/answers/155691/why-are-larger-events-are-truncated-10000-bytes.html
Even for pretty big values such as 200K characters.
hi,
I'm also not sure about the issue, but as per understanding you can try the below solution,
- remove "Indexed_Extarction" line
- add "KV_MODE = JSON"
let me know if it works or not