Monitoring Splunk

Deployment Monitor not displaying all forwarders

krussell101
Path Finder

We recently made a copy of our production environment. We are running in AWS. An individual made copies of each prod server ami and stood up identical hosts in another zone. Thus, none of the splunk configurations were changed when the test hosts initially came on line. I went through later and updated all of the test hosts with their own hostnames etc in the splunk config files.

We have a total of 17 hosts forwarding events to the indexing server. Same server where DM and primary Web Console is installed. DM only shows 13, and some of those are duplicates. 10 in DM if you don't count the duplicates.

I renamed every host and the requisite splunk configs after the test systems were created. They are all now using their FQDNs.

1) We used to have a host named "CoreCommandServer". I've since renamed it to use its FQDN. There is a test instance and a prod instance. DM is showing only the test instance (based on source IP address) and the name of that instance, in DM, is "CoreCommandServer", not the FQDN.

1a) Why is it not using the FQDN?
1b) Where is the prod instance? Events are coming in, but DM isn't listing it.

2) There are three servers which each show up twice in the DM list of forwarders. Identical names and source IP addresses. 2 of the 3 names are the FQDN, 1 is the old non-FQDN hostname.

I'm wondering how much timing has to do with this. IF that's the case, DM seems pretty fragile.

Either way, assistance would be hugely appreciated. I would like to use DM alerting to tell me when a forwarder disappears.
Thanks!

lguinn2
Legend

This is a search that I stole from the Deployment Monitor, and then modified/simplified. It identifies "missing" forwarders by comparing a list of forwarders from the past week with a list of the forwarders from today:

index=_internal source=*metrics.log group="tcpin_connections" earliest=@d 
| eval sourceHost=if(isnull(hostname), sourceHost,hostname)
| stats sum(kb) as KB_today by sourceHost | eval KB_today = round(KB_today)
| join type=outer sourceHost 
  [search index=_internal source=*metrics.log group="tcpin_connections"   earliest=-7d@d latest=@d 
   | eval sourceHost=if(isnull(hostname), sourceHost,hostname)
   | stats sum(kb) as KB_lastweek by sourceHost | eval KB_lastweek = round(KB_lastweek) ]
| eval Missing = if  (KB_today < 1, "Missing", "  ")

You could change this to an alert by adding

| where KB_today < 1

which would only list the "missing" forwarderer and then alert based on number of results > 0.

The Deployment Monitor has some definite weaknesses. But it is a great source for alert ideas...

0 Karma

lguinn2
Legend

Well, I think it was originally developed for a smaller environment. It doesn't really understand a large distributed environment with search heads, etc etc.

And I agree with you 🙂

0 Karma

krussell101
Path Finder

I'm amused because that's so far the only really good use of DM that I've found. As a great source for interesting search lines. 🙂

0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...