After a lot of long roads dealing with this issue in customer installs, I've come to the following conclusions:
1) given CHECK_FOR_HEADER's behavior around trapping important stanzas in 'learned' and renaming sourcetypes from 'foo' to 'foo-N', CHECK_FOR_HEADER is indeed evil. However if you have a sourcetype where the header fields are not reliably the same from system to system, it can be considered a necessary evil.
2) It gets worse before it gets better. In addition to breaking when you're using any kind of Splunk forwarder, it also breaks in distributed search. In short the knowledge on the search-head is always used to run the various aspects of the search, and as long as those AutoHeader rules are not on the search-head, those searches wont run right.
3) Despite all this rampant evil, it's possible to get things back in working order by diligently copying the "learned" stanzas out of "/etc/apps/learned/props.conf" and "/etc/apps/learned/transforms.conf", and putting those along with all your base "CHECK_FOR_HEADER stanzas, onto all indexers and search heads. This will make everything work properly. The main drawback is that whenever your production system decides to output a new kind of header row for the sourcetype that the Splunk system has never seen, those new learned configs will also be trapped on the forwarders and you'll have to do it all again.
4) While you can also set up specific regexes, you'll hit limits there. Possibly depending on how you construct the field extraction, you'll have a 32-character limit on field names plus also a 100 field limit. (Both of these limits are too low for this to be feasible in the data I'm talking about)
Specifics:
Specifically, I think it's a necessary evil in Cisco CallManager data. From version to version Cisco changes the field list slightly. Since my app has to work with all possible versions from old builds like 4.X up through 8.X, and since it's not a feasible task for me to know ahead of time all the dozens if not hundreds of different headers that have been in existence for the past ten years, my hand is a little forced. Also several of the field names are longer than 32 characters and there's over a hundred of them per row.
Obviously in the long term I am eager for Splunk to come up with some better method of indexing CSV's - a way that isn't subject to the current problems, but still preserves the nice parts of the current automatic behavior..
... View more