I'm building a Splunk App and I'd like my users to be able to point the import a single folder and have it accurately import each type of IRC log. However, Even with a well defined source type of:
[ZNC]
pulldown_type = true
category = IRC
LINE_BREAKER = \r{0,1}\n
SHOULD_LINEMERGE = false
MAX_TIMESTAMP_LOOKAHEAD = 15
EXTRACT-baseinfo = (?P<zncusername>\w+)[\\\/](?P<network>\w+)[\\\/](?P<channel>#.+)[\\\/](?:.+\.log) in source
EXTRACT-IRCmessage = (?:\[.+\]) <(?P<channeluser>.+)> (?P<message>.+)
EXTRACT-userjoinquitpart = (\*\*\*) (?P<event>.+): (?P<channeluser>.+) \((?P<ident>.+)@(?P<userhost>.+?)\)( \((?P<eventreason>.+?)\)){0,1}
EXTRACT-topic = \*\*\* (?P<uername>.+? ).+? (?P<event>.+?) to (?P<topic>\'.+')
EXTRACT-kicked = \*\*\* (?P<uername>.+?) was (?P<event>.+? )by (?P<eventintiatedby>.+?) \((?P<eventreason>.+)\)
LOOKUP-IRC_actions = EventLookup action AS event OUTPUT IRCevent as event
It will still randomly pick up partial dates as the source type. I've read about a field where you can specify source file regex for each sourcetype, but several of my planned source types have identical naming schemes.
So, What can I do to give hints to the autotype that this is type A vs type B?
Have you tried adding TIME_PREFIX and TIME_FORMAT in props.conf to help splunk know what is a correct time stamp?
Also your LINE_BREAKER doesn’t contain a capture group but it should. The capture group gets removed from the data so typically you want to use ([\n\r]+) which will break on every line and remove the line feeds from the data. Other times you might want some addditional regex like ([\n\r]+)additionalRegexHere to define event breaks while still dropping the line feeds.