Getting Data In

Truncate events in props.conf to reduce the license cost

damucka
Builder

Hello,

I have the following stanza in my props.conf for the relevant sourcetype:

[BWP_hanatraces]
TRUNCATE = 0
TRANSFORMS-BWP_parameterChangelog_clone
TRANSFORMS-eliminatedebug = setnull

Now, as the corresponding logs are the database logs, very often there are full SQL statements texts coming, they can be really long.
What I would like to achieve is to set the upper limit for a single event to e.g. 50 lines and maximal 5.000 characters, whatever is reached first. Also, this should not be only a splitting criteria - the rest of the event should be scrapped.

My questions:
- how would I do this and is it possible to get it only for my sourcetype BWP_hanatraces?
- Would it lower the license costs? Shortly speaking would the event truncation happen before the Splunk license costs get calculated?

Kind regards,
Kamil

0 Karma
1 Solution

FrankVl
Ultra Champion

Truncate will not split, it will do just what it says: truncate. But it works with bytes (which roughly aligns with characters typically), not lines. So you could use truncate for the 5000 char limit.

For the line limit, you could devise some kind of SEDCMD that strips off anything after 50 newlines. So together, put this in props.conf:

[BWP_hanatraces]
TRUNCATE = 5000
SEDCMD-truncate = s/((?:[^\r\n]*[\r\n]+){50}).*/\1/

Alternatively, you could also see if you can come up with a SEDCMD that in general strips out the whole query, but perhaps that is not what you want?

View solution in original post

0 Karma

FrankVl
Ultra Champion

Truncate will not split, it will do just what it says: truncate. But it works with bytes (which roughly aligns with characters typically), not lines. So you could use truncate for the 5000 char limit.

For the line limit, you could devise some kind of SEDCMD that strips off anything after 50 newlines. So together, put this in props.conf:

[BWP_hanatraces]
TRUNCATE = 5000
SEDCMD-truncate = s/((?:[^\r\n]*[\r\n]+){50}).*/\1/

Alternatively, you could also see if you can come up with a SEDCMD that in general strips out the whole query, but perhaps that is not what you want?

0 Karma

chris_barrett
Communicator

Ref: https://docs.splunk.com/Documentation/Splunk/7.2.6/Admin/HowSplunklicensingworks

How data is metered
For event data, data volume is based on the amount of raw external data that the indexer ingests into its indexing pipeline, after any filtering. It is not based on the amount of compressed data that gets written to disk. For metrics data, each metric event counts as a fixed 150 bytes. Metrics data does not use a separate license. Rather, it draws from the same license quota as event data.

The key above is the "after any filtering". The TRUNCATE and TRANSFORMS operations occur within the Parsing pipeline which is before the Indexing pipeline (https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Howindexingworks)

0 Karma

damucka
Builder

Hello @chris_barrett

Thank you.
And how would I restrict the number of lines to max 50 and number of characters to max 5000 per event?
As far as I understand the TRUNCATE will just split the long events into several smaller - this is not what I want. I would like to reduce the amount of data per event, just skipping everything what is longer than above limits.

Kind Regards,
Kamil

0 Karma

ddrillic
Ultra Champion

@damucka -

-- As far as I understand the TRUNCATE will just split the long events into several smaller

Not really.

The following says [props.conf.spec][1]

[1]: https://docs.splunk.com/Documentation/Splunk/7.2.6/Admin/Propsconf says

TRUNCATE =
* Change the default maximum line length (in bytes).
* Although this is in bytes, line length is rounded down when this would
otherwise land mid-character for multi-byte characters.
* Set to 0 if you never want truncation (very long lines are, however, often
a sign of garbage data).
* Defaults to 10000 bytes.

With the LINE's set-up, it ends up to be the event's total length and it's very common for your case and Java exceptions to use TRUNCATE in order to trim the event.

0 Karma

damucka
Builder

Thank you.

I tested the following setup in my props.conf:

[(?::){0}*hanatraces]
TRUNCATE = 1000
MAX_EVENTS = 50
TRANSFORMS-BWP_parameterChangelog_clone
TRANSFORMS-eliminatedebug = debugsetnull
TRANSFORMS-LogReplayCoordinator = LogReplaysetnull
TRANSFORMS-anon = anonymize-ip, anonymize-user

Now, I have following issue:
I noticed that the lines get truncated after 1000 characters, that is fine, but the event is not truncated after 50 lines but splited. As my target is to reduce the license, this does not help me - I would like the big events to have maximum 50 lines, each max 1000 lines. the rest of the event should be trashed, not splited.

Could you please advise how I would achieve this?

Kind Regards,
Kamil

0 Karma

chris_barrett
Communicator

I don't have a system here at home that I can test with but I believe that the following will do what you're after:
SHOULD_LINEMERGE = true
MAX_EVENTS = 50
TRUNCATE = 5000

You will however need to provide your own event breaking using BREAK_ONLY_BEFORE, MUST_BREAK_AFTER or similar.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...