Hello
Splunk is dropping lot of data from a log file in our Prod environment.
It errors out as below in _internal logs.
The data is pretty straight forward with single line events. Has anyone seen this happen before?
Dropping data without forwarding since the size is greater than 67108864, event size=95536310,
Props:
[scansafe]
INDEXED_EXTRACTIONS = TSV
TRUNCATE=0
SHOULD_LINEMERGE=False
disabled=false
FIELD_DELIMITER=\t
TRANSFORMS-null= setnull
Transforms:
[setnull]
REGEX = ^#
DEST_KEY = queue
FORMAT = nullQueue
Events look like
#Fields: datatime c-ip cs(X-Forwarded-For) cs-username cs-method cs-uri-scheme cs-host cs-uri-port cs-uri-path cs-uri-query cs(User-Agent) cs(Content-Type) cs-bytes sc-bytes sc-status sc(Content-Type) s-ip x-ss-category x-ss-last-rule-name x-ss-last-rule-action x-ss-block-type x-ss-block-value x-ss-external-ip x-ss-referer-host
2015-01-13 15:05:05 GMT 10.1.1.1 WinNT://NASDCORP\abcd CONNECT https 0009b601.pphosted.com 10000 / Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; ) - 0 0 0 10.1.1.1 c:busi default do_not_inspect 10.1.1.1
2015-01-13 15:05:58 GMT 10.1.1.1 10.1.1.1 CONNECT https 10.1.1.1 443 / Mozilla/4.0 (compatible) - 0 0 0 10.1.1.1 c:comp default do_not_inspect 10.1.1.1
2015-01-13 15:05:34 GMT 10.1.1.1 WinNT://NASDCORP\gghjhh GET http 421-vt.c3tag.com 80 / iN=456678&cid=421&nid=x-nid:Display<-ch-nid->Armonix-RT¶m4=300x250-female.jpg¶m1=300x250&w=1440&h=900&sT=5 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; ) - 0 27 200 text/html 10.1.1.1 c:busi default allow 10.1.1.1 bossip.com
And the props.conf and transforms.conf are present in SH, right?
Yup. They are applied on SH
TRUNCATE=0
SHOULD_LINEMERGE=False
The above settings only apply at parsing time, which happens AFTER the data is forwarded. So they aren't helping if they are on the forwarder.
props.conf settings for the forwarder:
[scansafe]
INDEXED_EXTRACTIONS = TSV
disabled=false
FIELD_DELIMITER=\t
props.conf settings for the indexer:
[scansafe]
TRUNCATE=0
SHOULD_LINEMERGE=False
disabled=false
TRANSFORMS-null= setnull
And transforms.conf goes on the indexer too, not the forwarder.
But when you choose INDEXED_EXTRACTIONS, Splunk is trying to do some parsing on the forwarder. Clearly, something is not working here. I am not sure why and you could open a ticket with Splunk Support for details. However, you could also approach this the "old fashioned way", which does not use indexed extractions:
For this technique, you do not need props.conf at all on the forwarder. Here are the props.conf settings for the indexer:
[scansafe]
TRUNCATE=0
SHOULD_LINEMERGE=False
disabled=false
TRANSFORMS-null= setnull
REPORT-scansafe=extract-scansafe
AND add this to your transforms.conf on the indexer
[extract-scansafe]
DELIMS =\t
FIELDS = field1, field2, field3
where field1,field2,field3 are the name of the fields; you could just copy this list from the #HEADER into the stanza. Using this technique, Splunk can't pick out the field names automatcially. I may not have the extract-scansafe stanza exactly right...
They are actually being indexed on the SH itself. There is no forwarder passing data to indexer here. SH itself does this job and just sends the events to indexers. The .txt files are being downloaded from a S3 bucket onto the SH directly.
Okay then, here are my updates to my answer above: All the files - inputs.conf, props.conf and transforms.conf - go on the search head (which is effectively a heavy forwarder in this case). Parsing will happen on the search head, and events will be dropped on the search head, with surviving events going to the indexer(s).
Please share you props.conf file for the processing these events. What is the size of single event? Do you see this error for each event or how frequently?
Added props and transforms to the first post. No, this event is happening on just one file each day. Its a random file. Even when I re index the file it still does the same. I tried it in our dev environment with the file and it give correct count. This is all proxy data so size of single event vary a lot. But is there a way that I can set it to ignore that size and just index it?