I've seen the related question "Override source key in inputs.conf".
I've pretty much decided that I do want to override the source key (although I'm open to counterarguments): the question now is, to what?
Here's my situation: I'm using a proprietary, platform-specific tool to extract many types of log records from various systems on that platform. I'm then sending those extracted log records to a remote Splunk instance via either HTTP (that is, to the Splunk HTTP Event Collector; EC) or TCP.
For the purposes of this question, I'm going to refer to that log extraction tool as xyz
.
Events ingested via EC have the source
field value http:xyz
, where xyz
is the name of the Event Collector token that I created for this purpose, deliberately matching the name of the tool. I am dimly aware of the possibility - although no use case occurs to me right now - that, in the future, I might want to create additional EC tokens for xyz
; perhaps I'll append qualifying terms with an underscore separator, I'm not sure.
Events ingested via TCP have the default source
field value tcp:6666
, where 6666 is the TCP port.
I don't feel that comfortable with this default source
value for the TCP-ingested events. I'd prefer a more "mnemonic" value that doesn't refer to a specific port number. In a multisite cluster, indexers might, for site-specific reasons, be listening on different port numbers. I think I'd prefer to have the same source
value - for example, tcp:xyz
- regardless of which indexer ingests an event, and what TCP port it's listening on.
So, although this naming scheme is likely simplistic - hence this question about best practice; I'm hoping for advice from more experienced users - I'm leaning towards source
values in the following format:
input :
sender
where sender is, in my case, the tool xyz
. So: http:xyz
(as now) for the EC-ingested events, and tcp:xyz
(instead of the default tcp:6666
) for the TCP-ingested events.
Thoughts, suggestions welcome.
For example:
xyz_http
?source
value - perhaps just xyz
- regardless of input (ingestion) method? Difficult to put my finger on many concrete reasons. Perhaps one: I'm sending JSON to both EC and TCP, but the JSON structure is slightly different (I wish it wasn't). If I need to debug ingestion issues, it might be helpful to be able to differentiate the events; but then, the inherent differences in the structure of the JSON payloads means I can already do that.I understand that some of this might come down to personal preference, but I'm interested in what other people are doing, and why.
I would use HEC:xyz
where HEC
is the common name for HTTP Event Collector
.
@woodcock Hal is right, initially we had decided to not use HEC. However that boat has since shipped and everyone is using it anyway. So, we have relaxed that and we are going to update our docs, which is why you are seeing HEC show up now. Thanks for reporting this.
lol I blame @damian dallimore
"Here ye, here he, henceforth HEC is a permitted term and you may use it without fear!"
Hi @woodcock, thanks for the suggestion:
I would use
HEC:xyz
whereHEC
is the common name for HTTP Event Collector.
How common?
The first Splunk blog post tagged http-event-collector
, "HTTP Event Collector, your DIRECT event pipe to Splunk 6.3", uses the abbreviation EC:
HTTP Event Collector (EC) is a new, robust, token-based JSON API
So does the Splunk dev topic "Introduction to Splunk HTTP Event Collector":
Welcome to Splunk HTTP Event Collector (EC)
So does the "Walkthrough" dev topic:
the EC port ... an HTTP Event Collector authentication token ("EC token"). EC tokens are ... the EC event protocol ...
But then, the latest Splunk blog post tagged `http-event-collector, "There is a “LOG”! Introducing Splunk Logging Driver in Docker 1.10.0", on 10 February 2016, refers to HEC:
Built on the HTTP Event Collector (HEC) ... Enable HEC ... Create a New HEC Token
And Googling for:
"HTTP Event Collector (HEC)" site:splunk.com
returns "about 38 results", whereas:
"HTTP Event Collector (EC)" site:splunk.com
returns "about 32 results".
If any Splunk tech writers are reading this: what's the official abbreviation: EC or HEC?
I know which acronym marketing likes, and it's not HEC. I agree that some clarity is needed.
Okay, I have spoken with the product manager about this and the inconsistency you see results from a change in usage. When we first introduced the feature, we officially abbreviated it as EC. Over time, HEC became more widely used, and we have adapted our standard to reflect that. The correct abbreviation is now HEC. We are updating the docs and original blog post to reflect this change.
I was judging from speakers @ Splunk events and also blogs. My experience there is that HEC is far more prevalent than EC. You sleuthing has shown that clearly a documentation cleanup and official statement on the matter is warranted.
Good spotting!
The Splunk Splexicon does not list any of EC
http://docs.splunk.com/Splexicon#anchorE nor HEC
http://docs.splunk.com/Splexicon#anchorH
A search for EC
http://docs.splunk.com/Special:SplunkSearch/docs?q=ec returns 8 results and a search for HEC
http://docs.splunk.com/Special:SplunkSearch/docs?q=hec returns 0 results.
cheers, MuS