Re: Enlarge index time extractions field limit

aukevanleeuwen · ‎02-28-2020

I have HEC messages that are indexed with the sourcetype _json. This is a build in Splunk source obviously and has the following configuration:

[_json]
pulldown_type = true
INDEXED_EXTRACTIONS = json
KV_MODE = none
category = Structured
description = JavaScript Object Notation format. For more information, visit http://json.org/

I have a problem however with the length of the indexed fields, they are truncated to 1000 characters. I can't seem to figure out what field I should set to increase that limit.

To give a bit more context, the HEC messages that I receive are roughly structured as follows:

{
  "id": "35298092067921924966859073695563957796481621929900441603",
  "level": "INFO",
  "message": "2020-02-27T16:33:10.666Z  e18c650c-7d2d-4acc-bf9c-bfbb1fd0cec4    INFO    {\"message\":\"Error while ... \"}"
}

So we actually have an extract field called message (and id and level) etc, but that field can be rather long and is truncated at 1000 characters.

I've try to find this in the limits.conf documentation, but I cannot find a definitive value there. Can somebody help me out?

AaronJaques · ‎11-12-2021

Hi @aukevanleeuwen , wondering if you were ever able to get indexed field extractions >1000 characters to work?

woodcock · ‎03-06-2020

From here:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf
Maybe this?

[kv]
avg_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that the average (over search
  results) execution time of a key-value pair extractor will be allowed to take
  before warning. Once the average becomes larger than this amount of time a
  warning will be issued
* Default: 500 (.5 seconds)

limit = <integer>
* The maximum number of fields that an automatic key-value field extraction
  (auto kv) can generate at search time.
* The summary fields 'host', 'index', 'source', 'sourcetype', 'eventtype',
  'linecount', 'splunk_server', and 'splunk_server_group' do not count against
  this limit and will always be returned.
* Increase this setting if, for example, you have data with a large
  number of columns and want to ensure that searches display all fields extracted
  from an automatic key-value field (auto kv) configuration.
* Set this value to 0 if you do not want to limit the number of fields
  that can be extracted at index time and search time.
* Default: 100

indexed_kv_limit = <integer>
* The maximum number of fields that can be extracted at index time from a data source.
* Fields that can be extracted at index time include default fields, custom fields,
  and structured data header fields.
* The summary fields 'host', 'index', 'source', 'sourcetype', 'eventtype', 'linecount',
  'splunk_server', and 'splunk_server_group' do not count against this limit and are
  always returned.
* Increase this setting if, for example, you have indexed data with a large
  number of columns and want to ensure that searches display all fields from
  the data.
* Set this value to 0 if you do not want to limit the number of fields
  that can be extracted at index time.
* Default: 200

maxchars = <integer>
* Truncate _raw to this size and then do auto KV.
* Default: 10240 characters

maxcols = <integer>
* When non-zero, the point at which kv should stop creating new fields.
* Default: 512

max_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that a key-value pair extractor
  will be allowed to take before warning. If the extractor exceeds this
  execution time on any event a warning will be issued
* Default: 1000 (1 second)

manjunathmeti · ‎03-02-2020

Check with below configurations:

props.conf

[_json]
TRUNCATE = 0

limits.conf

[kv]
maxchars = 1024000

aukevanleeuwen · ‎03-06-2020

Thanks for the answer. The TRUNCATE value I've already set to a value large enough for a single line. the maxchars value is currently set to a bigger value than 1000 so this is also unlikely.

Hmm... I'm trying to reproduce using a plain splunk/splunk:latest docker container, but there it seems to work. I.e. Values > 1000 get the correct field value (non-truncated). Maybe it's something on the Splunk Cloud side?

randy_moore · ‎04-22-2020

@aukevanleeuwen try this search. It will look for "truncating" messages in the _internal splunk index and let you know the largest it has seen and give you a recommended value to put in props.

index="_internal" sourcetype=splunkd source="*splunkd.log" log_level="WARN" "Truncating" 
| rex "line length >= (?<line_length>\d+)" 
| stats values(host) as host values(data_host) as data_host count last(_raw) as common_events last(_time) as _time max(line_length) as max_line_length by data_sourcetype log_level 
| table _time host data_host data_sourcetype log_level max_line_length count common_events 
| rename data_sourcetype as sourcetype 
| eval number=max_line_length 
| eval recommeneded_truncate=max_line_length+100000 
| eval recommeneded_truncate=recommeneded_truncate-(recommeneded_truncate%100000) 
| eval recommended_config="# props.conf
["+sourcetype+"]
TRUNCATE = "+recommeneded_truncate 
| table _time host data_host sourcetype log_level max_line_length recommeneded_truncate recommended_config count common_events 
| sort -count

Credit goes to @rob_jordan as I found what he posted in this Answer: https://answers.splunk.com/answers/155691/why-are-larger-events-are-truncated-10000-bytes.html

aukevanleeuwen · ‎03-06-2020

Even for pretty big values such as 200K characters.

gaurav_maniar · ‎03-02-2020

hi,

I'm also not sure about the issue, but as per understanding you can try the below solution,
- remove "Indexed_Extarction" line
- add "KV_MODE = JSON"

let me know if it works or not

Enlarge index time extractions field limit

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases