Why is a monitored file behaving like a batch file...

eallanjr · ‎04-27-2015

I have a monitored file input for a .tsv file that gets updated via a SQL query every hour. However, the data is only showing up in the index periodically (haven't been able to determine the frequency, but it isn't hourly like it should be). If I restart the forwarder I see the TailingProcessor add a watch, but the file subsequently gets handled by the BatchReader as shown in the log snippet below. For other files inputs using the [monitor://...] stanza I don't see any log entries related to the BatchReader, any ideas why this one is being treated any differently? Universal forwarder is version 6.2.1.

# grep metrics5.tsv /opt/splunkforwarder/var/log/splunk/splunkd.log
04-27-2015 09:46:13.653 -0400 INFO  TailingProcessor - Parsing configuration stanza: monitor:///data/log/hadoop_job_metrics/metrics5.tsv.
04-27-2015 09:46:13.653 -0400 INFO  TailingProcessor - Adding watch on path: /data/log/hadoop_job_metrics/metrics5.tsv.
04-27-2015 09:46:13.660 -0400 INFO  BatchReader - Removed from queue file='/data/log/hadoop_job_metrics/metrics5.tsv'.
04-27-2015 10:01:47.734 -0400 INFO  BatchReader - Removed from queue file='/data/log/hadoop_job_metrics/metrics5.tsv'.

The API also indicates it is being read in batch mode:
https://localhost:8089/services/admin/inputstatus/TailingProcessor%3AFileStatus

/data/log/hadoop_job_metrics/metrics5.tsv   
file position   23783042
file size   23783042
percent 100.00
type    done reading (batch)

inputs.conf:

[monitor:///data/log/hadoop_job_metrics/metrics5.tsv]
disabled = false
sourcetype = hadoop_job_metrics_v2
index = main
crcSalt = <SOURCE>

props.conf:

[hadoop_job_metrics_v2]
FIELD_DELIMITER = tab
FIELD_NAMES = JOB_ID,JOB_STATUS,JOB_FAILED_MAP_ATTEMPTS,JOB_FAILED_REDUCE_ATTEMPTS,JOB_FILE_BYTES_WRITTEN,JOB_FINISHED_MAP_TASKS,JOB_FINISHED_REDUCE_TASKS,JOB_PRIORITY,JOB_TOTAL_LAUNCHED_MAPS,JOB_TOTAL_LAUNCED_REDUCES,JOB_CPU_MILLISECONDS,MAP_CPU_MILLISECONDS,RED_CPU_MILLISECONDS,JOB_MAPRFS_BYTES_READ,MAP_MAPRFS_BYTES_READ,RED_MAPRFS_BYTES_READ,JOB_MAPRFS_BYTES_WRITTEN,MAP_MAPRFS_BYTES_WRITTEN,RED_MAPRFS_BYTES_WRITTEN,JOB_PHYSICAL_MEMORY_BYTES,MAP_PHYSICAL_MEMORY_BYTES,RED_PHYSICAL_MEMORY_BYTES,JOB_VIRTUAL_MEMORY_BYTES,MAP_VIRTUAL_MEMORY_BYTES,RED_VIRTUAL_MEMORY_BYTES,JOB_NAME,PARENT_JOB_ID,USER_SUBMITTED,TIME_SUBMITTED,TIME_STARTED,TIME_FINISHED,CLUSTER_ID,CREATED
HEADER_FIELD_DELIMITER = tab
INDEXED_EXTRACTIONS = tsv
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
TIMESTAMP_FIELDS = CREATED
category = Structured
description = Tab-separated value format. Set header and other settings in "Delimited Settings"
disabled = false
pulldown_type = true

balaji_venkat · ‎10-20-2015

Any file which is greater than 25 MB in size while processing in the Universal Forwarder will be automatically assigned to BatchReader.
This is answered below already

https://answers.splunk.com/answers/109779/when-is-the-batchreader-used-and-when-is-the-tailingproces...

Why is a monitored file behaving like a batch file on a Splunk 6.2.1 Universal forwarder?

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms