We are looking at [potentially] adding an abstraction layer in between a host and the indexers but we of course lose the metadata so key to spunk. We are looking to use fluentd as the abstraction layer/data pipeline. In many cases, I have a nice json output with key/value pairs but I would like to use the values of a few keys to rewrite the metadata (host, index, source, sourcetype). So lets say we have this:
{"sourcetype":"fluentd","index":"main"}
How do I carve out those field and rewrite them as metadata? It seems that I need use a regex, can't I use the keys?
Any help is much appreciated!
Extracting and indexing processes happen in a particular order. Regex can be confusing, but this one can be really simple.
Try something like this -
props:
[bv]
KV_MODE = json
INDEXED_EXTRACTIONS = json
TRANSFORMS-extract = json_extraction, index_reset
FIELDALIAS-conn_id = protocol.session_id AS conn_id
transforms.conf:
[json_extraction]
SOURCE_KEY = _raw
DEST_KEY = _raw
REGEX = ^([^{]+)({.+})$
FORMAT = $2
[index_reset]
SOURCE_KEY = index
DEST_KEY = _MetaData:index
REGEX = .
FORMAT = $1
Most of this was copied from your comment on this one - https://answers.splunk.com/answers/501118/setting-event-time-and-host-metadata-from-keyvalue.html. I've just modified it to run two different TRANSFORMS-extract stanzas, the second of which takes the entire value of the index field and uses it to rewrite the index metadata.
Basic method is cribbed from here - https://answers.splunk.com/answers/1026/route-data-to-index-based-on-host.html
Wiser heads should feel free to comment on any issues with this code.
note, for index, ONLY, use _Metadata:index
, for any other metadata, use Metadata:Host
(for example)
http://docs.splunk.com/Documentation/Splunk/5.0.3/Admin/Transformsconf
Thank you for the response. It did not work:
props.conf:
[bv]
KV_MODE = json
INDEXED_EXTRACTIONS = JSON
TRANSFORMS-extract = json_extraction, host_extraction
FIELDALIAS-conn_id = protocol.session_id AS conn_id
FIELDALIAS-timestamp = protocol.timestamp AS ts
TIMESTAMP_FIELDS = "protocol.timestamp"
Transforms.conf:
[json_extraction]
SOURCE_KEY = _raw
DEST_KEY = _raw
REGEX = ^([^{]+)({.+})$
FORMAT = $2
[host_extraction]
SOURCE_KEY = protocol.host
DEST_KEY = MetaData:host
REGEX = .
FORMAT = $1
You will see the I am trying to rewrite the host metadata and not index. However I get this error:
Splunk> Take the sh out of IT.
Checking prerequisites...
Checking http port [8000]: open
Checking mgmt port [8089]: open
Checking appserver port [127.0.0.1:8065]: open
Checking kvstore port [8191]: open
Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _internal _introspection _telemetry _thefishbucket bro bv firedalerts history main os summary unix_summary
Done
Checking filesystem compatibility... Done
Checking conf files for problems...
Done
Undocumented key used in transforms.conf; stanza='host_extraction' setting='SOURCE_KEY' key='protocol.host'
Undocumented key used in transforms.conf; stanza='host_extraction' setting='DEST_KEY' key='MetaData:host'
Please resolve these problems by correcting typos in key names, or by adding them to [accepted_keys] in transforms.conf if they are intended.
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunk/splunk-6.5.2-67571ef4b87d-linux-2.6-x86_64-manifest'
All installed files intact.
Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Done
[ OK ]
Waiting for web server at http://127.0.0.1:8000 to be available........... Done
What am I missing here? Again, any help is MUCH appreciated!
It is telling you that the field key='protocol.host' is not known at the time that the config is being analyzed. Check to make sure that spelling and capitalization is exactly what you expect to extract, and that the underlying field will have been extracted before this rule runs. If so, then you can tell splunk not to worry about it with an entry in [accepted_keys]
.
Regarding the second one, from reviewing a few other posts, I believe that Metadata:Host
has to have a capital H.
http://docs.splunk.com/Documentation/Splunk/5.0.3/Admin/Transformsconf
"By adding entries to [accepted_keys]
, you can tell Splunk that a key that is not documented is a key you intend to work for reasons that are valid in your environment / app / etc."
[accepted_keys]
is_valid= protocol.host
Did this question ever get answered? I find it hard to believe that SPlunk cannot rewrite metadata from kv pairs (or json, xml etc...). The links above do not seem to cover this from a kv pair perspective, it seems to require REGEX. What am I missing?
Seems like overriding metadata like host/sourcetype/index etc based on event data. You have to use the TRANSFORM so to override them. See these links for overriding host and sourcetype. Overriding index would be same.
https://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Overridedefaulthostassignments
http://docs.splunk.com/Documentation/Splunk/6.5.1/Data/Advancedsourcetypeoverrides
https://answers.splunk.com/answers/301504/how-to-override-sourcetype-and-index-assignment.html