Getting Data In

Why is Splunk dropping data from a log file with error "Dropping data without forwarding since the size is greater than 67108864, event size=95536310"?

theouhuios
Motivator

Hello

Splunk is dropping lot of data from a log file in our Prod environment.

It errors out as below in _internal logs.

The data is pretty straight forward with single line events. Has anyone seen this happen before?

Dropping data without forwarding since the size is greater than 67108864, event size=95536310, 

Props:

[scansafe]
INDEXED_EXTRACTIONS = TSV
TRUNCATE=0
SHOULD_LINEMERGE=False
disabled=false
FIELD_DELIMITER=\t
TRANSFORMS-null= setnull

Transforms:

[setnull]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue

Events look like

#Fields: datatime       c-ip    cs(X-Forwarded-For)     cs-username     cs-method       cs-uri-scheme   cs-host cs-uri-port     cs-uri-path     cs-uri-query    cs(User-Agent)  cs(Content-Type)        cs-bytes        sc-bytes        sc-status       sc(Content-Type)        s-ip    x-ss-category   x-ss-last-rule-name     x-ss-last-rule-action   x-ss-block-type x-ss-block-value        x-ss-external-ip        x-ss-referer-host
2015-01-13 15:05:05 GMT 10.1.1.1           WinNT://NASDCORP\abcd CONNECT https   0009b601.pphosted.com   10000   /               Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; )        -       0       0       0               10.1.1.1  c:busi  default do_not_inspect                  10.1.1.1
2015-01-13 15:05:58 GMT 10.1.1.1              10.1.1.1      CONNECT https   10.1.1.1 443     /               Mozilla/4.0 (compatible)        -       0       0       0               10.1.1.1 c:comp  default do_not_inspect                  10.1.1.1
2015-01-13 15:05:34 GMT 10.1.1.1                WinNT://NASDCORP\gghjhh       GET     http    421-vt.c3tag.com        80      /       iN=456678&cid=421&nid=x-nid:Display<-ch-nid->Armonix-RT&param4=300x250-female.jpg&param1=300x250&w=1440&h=900&sT=5      Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; )        -       0       27      200     text/html       10.1.1.1 c:busi  default allow                   10.1.1.1       bossip.com
Tags (2)
0 Karma

somesoni2
Revered Legend

And the props.conf and transforms.conf are present in SH, right?

0 Karma

theouhuios
Motivator

Yup. They are applied on SH

0 Karma

lguinn2
Legend
 TRUNCATE=0
 SHOULD_LINEMERGE=False

The above settings only apply at parsing time, which happens AFTER the data is forwarded. So they aren't helping if they are on the forwarder.

props.conf settings for the forwarder:

 [scansafe]
 INDEXED_EXTRACTIONS = TSV
 disabled=false
 FIELD_DELIMITER=\t

props.conf settings for the indexer:

 [scansafe]
 TRUNCATE=0
 SHOULD_LINEMERGE=False
 disabled=false
 TRANSFORMS-null= setnull

And transforms.conf goes on the indexer too, not the forwarder.

But when you choose INDEXED_EXTRACTIONS, Splunk is trying to do some parsing on the forwarder. Clearly, something is not working here. I am not sure why and you could open a ticket with Splunk Support for details. However, you could also approach this the "old fashioned way", which does not use indexed extractions:

For this technique, you do not need props.conf at all on the forwarder. Here are the props.conf settings for the indexer:

[scansafe]
TRUNCATE=0
SHOULD_LINEMERGE=False
disabled=false
TRANSFORMS-null= setnull
REPORT-scansafe=extract-scansafe

AND add this to your transforms.conf on the indexer

[extract-scansafe]
DELIMS =\t
FIELDS = field1, field2, field3

where field1,field2,field3 are the name of the fields; you could just copy this list from the #HEADER into the stanza. Using this technique, Splunk can't pick out the field names automatcially. I may not have the extract-scansafe stanza exactly right...

0 Karma

theouhuios
Motivator

They are actually being indexed on the SH itself. There is no forwarder passing data to indexer here. SH itself does this job and just sends the events to indexers. The .txt files are being downloaded from a S3 bucket onto the SH directly.

0 Karma

lguinn2
Legend

Okay then, here are my updates to my answer above: All the files - inputs.conf, props.conf and transforms.conf - go on the search head (which is effectively a heavy forwarder in this case). Parsing will happen on the search head, and events will be dropped on the search head, with surviving events going to the indexer(s).

0 Karma

jayannah
Builder

Please share you props.conf file for the processing these events. What is the size of single event? Do you see this error for each event or how frequently?

0 Karma

theouhuios
Motivator

Added props and transforms to the first post. No, this event is happening on just one file each day. Its a random file. Even when I re index the file it still does the same. I tried it in our dev environment with the file and it give correct count. This is all proxy data so size of single event vary a lot. But is there a way that I can set it to ignore that size and just index it?

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...