We have 4 servers running applications that should log into splunk.
Logtypes are :
2x apache = sourcetype=access_combined
1app x log4j = sourcetype=log4j
1app x log4j = sourcetype=log4j but different contents
inputs.conf will be
udp:4444
sourcetype=????
Question:
Is splunk able to select the correct parser? How is this done?
Which sourcetype do I have to enter in the inputs.conf?
How can I expand the field extraction in props/transforms if I do notknow the sourcetype for this special input? (I may have some other log4j inputs, too)
Technically speaking, you can do something like this:
props.conf
[source::udp:4444]
TRANSFORMS-set_sourcetype = set_sourcetype_access_combined, set_sourcetype_log4j, set_sourcetype_something_else
transforms.conf
[set_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your access_combined data
# For example, the REGEX used for Cisco ASA matching is REGEX = %ASA-\d-\d{6}
# Check out the Splunk Add-on for Cisco ASA for more examples.
FORMAT = sourcetype::access_combined
[set_sourcetype_log4j]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your log4j data
FORMAT = sourcetype::log4j
[set_sourcetype_something_else]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your other data (you get the idea by now...)
FORMAT = sourcetype::your_sourcetype
But, as @FrankVI pointed out, this can be cumbersome over time to maintain the regexes if you data changes.
Technically speaking, you can do something like this:
props.conf
[source::udp:4444]
TRANSFORMS-set_sourcetype = set_sourcetype_access_combined, set_sourcetype_log4j, set_sourcetype_something_else
transforms.conf
[set_sourcetype_access_combined]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your access_combined data
# For example, the REGEX used for Cisco ASA matching is REGEX = %ASA-\d-\d{6}
# Check out the Splunk Add-on for Cisco ASA for more examples.
FORMAT = sourcetype::access_combined
[set_sourcetype_log4j]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your log4j data
FORMAT = sourcetype::log4j
[set_sourcetype_something_else]
DEST_KEY = MetaData:Sourcetype
REGEX = some regex that matches your other data (you get the idea by now...)
FORMAT = sourcetype::your_sourcetype
But, as @FrankVI pointed out, this can be cumbersome over time to maintain the regexes if you data changes.
ok this might come with some limitations but helps me in a similar context where I have to split the events from one datasource into different indexes based on regex (distinguish user access rights)
Thanks for that elaboration on what can be done @jconger 🙂
It is not just cumbersome though. If the events require different index-time processing (e.g. different TIME_FORMAT setting, different LINE_BREAKER, etc.) it will simply be impossible to use a combined input like this, as you cannot use meta field overrides to influence the index-time config that get's applied.
E.g. following your example, if you would add the following to props.conf, it would get completely ignored for this data.
[access_combined]
TIME_FORMAT = foo
LINE_BREAKER = yada
[log4j]
TIME_FORMAT = bar
LINE_BREAKER = bla
I would really suggest figuring out a different solution than sending several very different log types to a single UDP input. At least send different log types to different ports, but even better: just put a Universal Forwarder on each of those servers and read the logs locally from files.
Although there are ways to set a generic sourcetype in inputs.conf and then use props and transforms to override that based on event content, that gets messy quite quickly and certain index time configurations (like TIME_FORMAT) will not be able to use the overridden sourcetype.