I've had the misfortune of feeding 30K input files from Amazon S3 Cloudfront logs into my live Splunk instance, without specifying a sourcetype.
This has created a serious problem in that it has resulted in thousands of automatically created variants of sourcetype-too-small from the bizarre headers that Amazon likes to use (note that the REAL data does not cause this issue).
As a result, performance has slowed to a crawl.
I've deleted the "bad" events, but is there something I can do about the bad automatically created sourcetypes?
As to why I didn't notice this--it didn't become a problem until the number of sourcetypes grew to a prodigous value. And since my searches excluded bad events, I never noticed the sourcetypes.
Hi markgo
I recently fixed that by adding this to my props.conf & transforms.conf:
**props.conf**
[default]
TRANSFORMS-meta = fix_auto_source
**transforms.conf**
[fix_auto_source]
SOURCE_KEY = MetaData:Source
DEST_KEY = MetaData:Source
REGEX = ^(/.*|.:.*)
FORMAT = source::splunktcp://25000
this changes all those automatically created sources to splunktcp://25000.
hope this helps a bit and don't forget to change the regex to match your pattern.
regards