I'm having trouble getting a CSV file that I've stored in Amazon S3 to properly split at index-time.
I'm using the Splunk Add-on for AWS, which allows me to define an S3 bucket to monitor. It pulls the data down just fine when a new CSV is uploaded:
[aws_s3://s3_autoruns]
disabled = false
aws_account = Splunk Reader
bucket_name = mybucket
index = jm
initial_scan_datetime = default
interval = 30
max_items = 100000
max_retries = 10
recursion_depth = 3
sourcetype = s3_autoruns
whitelist = .*/autoruns.txt$
blacklist = .*
character_set = UTF-16LE
I have in my props.conf a working transform (which changes the Host field to part of the S3 url), so I know this stanza is hitting for this data.
[source::.../autoruns.txt]
TRANSFORMS-s3host = transform-s3-integhost
DATETIME_CONFIG=CURRENT
With this, I get an event per line of the file.
I think I should be able to add to my props.conf:
INDEXED_EXTRACTIONS=CSV
FIELD_NAMES=Time,EntryLocation,Entry,Enabled,Category,Description,Publisher,ImagePath,LaunchString,MD5,SHA-1,SHA-256
FIELD_DELIMITER=,
But when I do that, it does not change anything. I still get one event per line, and no EntryLocation field to search on.
Any thoughts?
Thanks,
--Joe
I have run into this similar issue when streaming data via scripted input into Splunk. In the interim, please use the DELIMS option for search time field extractions:
http://docs.splunk.com/Documentation/Splunk/6.2.1/Admin/transformsconf
If I mirror the S3 bucket to a local directory and monitor it, it splits nicely:
[monitor:///data]
disabled = 0
crcSalt = <SOURCE>
index = jm
sourcetype = s3_autoruns
whitelist = .*/autoruns.txt$
--Joe