Solved: Re: Suggestions to categorize data from certain ho...

Andrew_Goktepe · ‎05-07-2010

Is it possible to define custom fields and hard-code their values on a per-forwarder basis?

I'm looking to use such a feature to define a custom field for server "environment" that will be hard-coded on each forwarder. For example, a group of nodes will be part of a QA1 environment and another will belong to QA2. On the forwarders for these nodes I want to be able to hard-code the "QA1" and "QA2" values to support searches that filter on the environment.

I'm not sure this is possible with props.conf and transforms.conf since the values are not present in actual log messages.

jrodman · ‎05-07-2010

It's not really clear what the ultimate goal here is. Would it be sufficient to simply tag the host field, so that your hosts are categorized? http://www.splunk.com/base/Documentation/latest/Knowledge/Tagthehostfield

There can sometimes be a difference between forwarder and host, but in many environemnts they will be the same.

There can be some performance concerns surrounding tag-category searches where the number of hosts per category is large (hundreds), and the category is the constraining factor (large time range, no other constraints). If you do happen to fall into this sort of case, it might be worth creating an index-time field that categorizes the data.

View solution in original post

dwaddle · ‎05-07-2010

We do similar, tagging an application running on a host as to its "environment type" - aka PROD, QA, UAT.

At the forwarder, in inputs.conf:

[monitor:///path/to/some/app/log]
disabled = false
sourcetype = my_app
env=UAT

At the indexer in transforms.conf:

[myapp_environment]
SOURCE_KEY = env
REGEX = (.*)
FORMAT = myapp_env::$1
WRITE_META = true

At the indexer in props.conf:

[my_app]
TRANSFORMS-env=myapp_environment

This way, it gets defined as an indexed field as well. With this configuration, the inputs.conf at the forwarder is the controller of what classification a specific application gets. We have many stanzas of monitor://, one pointing to each app instance with the appropriate environment setting. We also extract the "app name" as an indexed field from the log file name. So, given an app's name and it's environment classification you can do a relatively fast search for it.

smisplunk · ‎05-07-2010

Add a stanza to transforms.conf like this

[extra-tag]
REGEX = (.)
FORMAT = YOUR_FIELD::YOUR_VALUE

props.conf should reference this transform for your sourcetype.

Optionally, you can add the field to fields.conf to be stamped onto the metadata.

The transforms.conf would also need:

WRITE_META = TRUE

and you'd add an entry in fields.conf:

[YOUR_FIELD]
INDEXED = true

jrodman · ‎05-07-2010

It's not really clear what the ultimate goal here is. Would it be sufficient to simply tag the host field, so that your hosts are categorized? http://www.splunk.com/base/Documentation/latest/Knowledge/Tagthehostfield

There can sometimes be a difference between forwarder and host, but in many environemnts they will be the same.

There can be some performance concerns surrounding tag-category searches where the number of hosts per category is large (hundreds), and the category is the constraining factor (large time range, no other constraints). If you do happen to fall into this sort of case, it might be worth creating an index-time field that categorizes the data.

gkanapathy · ‎05-10-2010

Also, you may also consider a lookup table over tags.

jrodman · ‎05-07-2010

Remember that you can't really create index-time fields except at index time. It's possible but quite laborious to add them later (you'll be in some underdocumented product features). You may want to do some measurement around this case if you're likely to have a pretty large deployment or want to search over large time ranges. I expect that tagging will be fine for most user and most patterns.

Andrew_Goktepe · ‎05-07-2010

Tagging the host field makes the most sense in our environment. I have set this up and it works great. Thank you for pointing this out.

It is also good to know this can be done at indexing time if we hit any search performance issues down the road. In this environment it would be acceptable to disregard existing data and begin indexing with the new field so retrofitting the index would not be a concern.

Suggestions to categorize data from certain hosts to search on it later?

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases