Is it possible to define custom fields and hard-code their values on a per-forwarder basis?
I'm looking to use such a feature to define a custom field for server "environment" that will be hard-coded on each forwarder. For example, a group of nodes will be part of a QA1 environment and another will belong to QA2. On the forwarders for these nodes I want to be able to hard-code the "QA1" and "QA2" values to support searches that filter on the environment.
I'm not sure this is possible with props.conf and transforms.conf since the values are not present in actual log messages.
It's not really clear what the ultimate goal here is. Would it be sufficient to simply tag the host field, so that your hosts are categorized? http://www.splunk.com/base/Documentation/latest/Knowledge/Tagthehostfield
There can sometimes be a difference between forwarder and host, but in many environemnts they will be the same.
There can be some performance concerns surrounding tag-category searches where the number of hosts per category is large (hundreds), and the category is the constraining factor (large time range, no other constraints). If you do happen to fall into this sort of case, it might be worth creating an index-time field that categorizes the data.
We do similar, tagging an application running on a host as to its "environment type" - aka PROD, QA, UAT.
At the forwarder, in inputs.conf:
[monitor:///path/to/some/app/log]
disabled = false
sourcetype = my_app
env=UAT
At the indexer in transforms.conf:
[myapp_environment]
SOURCE_KEY = env
REGEX = (.*)
FORMAT = myapp_env::$1
WRITE_META = true
At the indexer in props.conf:
[my_app]
TRANSFORMS-env=myapp_environment
This way, it gets defined as an indexed field as well. With this configuration, the inputs.conf at the forwarder is the controller of what classification a specific application gets. We have many stanzas of monitor://, one pointing to each app instance with the appropriate environment setting. We also extract the "app name" as an indexed field from the log file name. So, given an app's name and it's environment classification you can do a relatively fast search for it.
Add a stanza to transforms.conf like this
[extra-tag]
REGEX = (.)
FORMAT = YOUR_FIELD::YOUR_VALUE
props.conf should reference this transform for your sourcetype.
Optionally, you can add the field to fields.conf to be stamped onto the metadata.
The transforms.conf would also need:
WRITE_META = TRUE
and you'd add an entry in fields.conf:
[YOUR_FIELD]
INDEXED = true
It's not really clear what the ultimate goal here is. Would it be sufficient to simply tag the host field, so that your hosts are categorized? http://www.splunk.com/base/Documentation/latest/Knowledge/Tagthehostfield
There can sometimes be a difference between forwarder and host, but in many environemnts they will be the same.
There can be some performance concerns surrounding tag-category searches where the number of hosts per category is large (hundreds), and the category is the constraining factor (large time range, no other constraints). If you do happen to fall into this sort of case, it might be worth creating an index-time field that categorizes the data.
Also, you may also consider a lookup table over tags.
Remember that you can't really create index-time fields except at index time. It's possible but quite laborious to add them later (you'll be in some underdocumented product features). You may want to do some measurement around this case if you're likely to have a pretty large deployment or want to search over large time ranges. I expect that tagging will be fine for most user and most patterns.
Tagging the host field makes the most sense in our environment. I have set this up and it works great. Thank you for pointing this out.
It is also good to know this can be done at indexing time if we hit any search performance issues down the road. In this environment it would be acceptable to disregard existing data and begin indexing with the new field so retrofitting the index would not be a concern.