Splunk Search

field extraction very long field

gtonti
Explorer

I have a log file that sometimes has very long field.
A row of my log is:
018-07-31 10:22:38.8701 inoutLogger level="ERROR" timestamp="31/07/2018 10:22:38" Elapsed_ms="1218.7727" richiesta='"<?xml version="1.0" encoding="utf-16"?><my very long xml>"'

my props.file is:
NO_BINARY_CHECK = 1
SHOULD_LINEMERGE=false
TRUNCATE=0
pulldown_type = 1

Sometimes if I make a search of the field "richiesta" I have the field truncated.

search xxxx | table richiesta

I obtain only a part of the xml (es. "<?xml version="1.0").

Any suggestion?

Thanks
Gianluca

0 Karma

kmorris_splunk
Splunk Employee
Splunk Employee
0 Karma

gtonti
Explorer

Hi kmorris,

in my props.conf I have already TRUNCATE=0. According to the documentation splunk should never truncate.

Kind Regards
Gianluca

0 Karma

sudosplunk
Motivator

Did you define TRUNCATE=0 under same stanza as your sourcetype or source? The reason I am asking this is to see if there are any precedence issues.

0 Karma

gtonti
Explorer

hi nittala_surya,

my props.conf is

**[invest-be-inout-crg]

BREAK_ONLY_BEFORE=\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}

NO_BINARY_CHECK = 1
SHOULD_LINEMERGE=false
TRUNCATE=0
pulldown_type = 1
TRANSFORMS-filter_logs = extract_fields-invest-be-inout,extract_fields_Source_Wel**

in my search query I set sourcetype = invest-be-inout-crg

so I think the truncate is in the correct place. Is there a way to check in the UI?

btw I will try TRUNCATE=20000 and I will see if this will solve the problem

Thanks
regards
Gianluca

0 Karma

thambisetty
SplunkTrust
SplunkTrust

Hi,

Are you able to see full value of richiesta field in raw log?

Are you getting truncated value only while displaying it in table?

————————————
If this helps, give a like below.
0 Karma

gtonti
Explorer

Hi thambisetty,

if I make a query "search xxxx | fields richiesta" I see the field truncated even if the raw data is complete

If I make a similar query "search xxxx |eval len=len(_raw) | eval len_rich=len(richiesta) | table richiesta len len_rich"

It looks to me the field is truncated when the raw has a length > 10000. It is always truncated when length is > 10000, it is never truncated when the length is less then 10000.

Bye

0 Karma

thambisetty
SplunkTrust
SplunkTrust

That means field value is not extracted as you expected if you can post samle raw event i can help you with regex to extract richiest field value.

————————————
If this helps, give a like below.
0 Karma

gtonti
Explorer

hi thambisetti,

I didn't define a transform because, according to splunk the log is already written in key='value'. It should extract the value automatically.
It is an option that I can consider to write a transform if I don't find a solution.

Thanks
Kind regards
Gianluca

0 Karma

sudosplunk
Motivator

If you want to extract values at search-time, you use spath command like this: search xxxx | spath input=richiesta. This strips all xml fields automatically. More info here.

If this is not what you're looking, then please provide some sample events and I can help you with regular expressions to extract fields using props.conf.

0 Karma

gtonti
Explorer

hy nittala_surya,

thank you for your reply. What I am not able to understand is why the field richiesta contains only a part of the xml.
The spath command is very interesting

Kind regards

0 Karma

sudosplunk
Motivator

What do you mean by "a part of xml". Like, only richiesta field has xml data and the raw event is just text?

0 Karma

gtonti
Explorer

Mi field richiesta in the log file is:
.... richiesta='"<?xml version="1.0" encoding="utf-16"?><MemoConsulenzaRequest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt;&lt;ZSRVEXT&gt;&lt;Username xmlns="http://www.cadit.it/MW/MWGSSRE"&gt;ut27537&lt;/Username&gt;&lt;/ZSRVEXT&gt;&lt;/MemoConsulenzaReques...' .....

if I query richiesta at search time I obtain only the first part of richiesta "<?xml version="1.0"

0 Karma

sudosplunk
Motivator

I think the default values of [kv] (Key-value) are the reason for truncation. According the limits.conf, below are the default values. Check if any of these apply to "richiesta".

avg_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that the average (over search
  results) execution time of a key-value pair extractor will be allowed to take
  before warning. Once the average becomes larger than this amount of time a
  warning will be issued
* Default: 500 (.5 seconds)

limit = <integer>
* The maximum number of fields that an automatic key-value field extraction
  (auto kv) can generate at search time.
* If search-time field extractions are disabled (KV_MODE=none in props.conf)
  then this setting determines the number of index-time fields that will be
  returned.
* The summary fields 'host', 'index', 'source', 'sourcetype', 'eventtype',
  'linecount', 'splunk_server', and 'splunk_server_group' do not count against
  this limit and will always be returned.
* Increase this setting if, for example, you have indexed data with a large
  number of columns and want to ensure that searches display all fields from
  the data.
* Default: 100

maxchars = <integer>
* Truncate _raw to this size and then do auto KV.
* Default: 10240 characters

maxcols = <integer>
* When non-zero, the point at which kv should stop creating new fields.
* Default: 512

max_extractor_time = <integer>
* Maximum amount of CPU time, in milliseconds, that a key-value pair extractor
  will be allowed to take before warning. If the extractor exceeds this
  execution time on any event a warning will b

Just a note...The longer and more complicated your events, the more you would get out of hand coding the field extractions. The auto extractor, while "correct", does not necessarily produce the most efficient regular expressions for the data.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...