I am hitting a mental block in creating this query and wish to monitor our server performance so we have visibility on our service.
Originally we were looking at individuals servers' events and so were able to tell the uptime and availability of that server but to optimise our queries we had to move all of our servers together.
In doing this if there is a server that isn't sending events to Splunk then it is completely ignored which didn't happen before when specifically looking for it.
Here is my current query but still doesn't seem to create a 0 for missing servers:
Like this:
index=identity sourcetype="identity_tcpsocket" (LogSource=Proxy OR LogSource=Api)
| eval ServerName=if(isnotnull(ProxyHostName),ProxyHostName,upper(substr(host,0,15)))
| search [| inputlookup ProdServerHostName.csv | fields ServerHostName | rename ServerHostName as ServerName]
| stats count AS total_requests count(eval('Message.Response.Status'="500")) AS server_errors BY ServerName
| eval error_percentage=((server_errors/total_requests) * 100),
Status=case((error_percentage >=15 OR total_requests <=20), "DOWN",
((error_percentage >=10 AND error_percentage < 15) OR total_requests <20), "WARNING - CHECK SERVICE",
error_percentage <10, "OK")
| table ServerName Status
| where Status!="OK"
Ideally my output would be a list of Servers that are not fitting the OK Criteria as seen above.
If a server is not sending data to Splunk then I want that to be listed as DOWN
I am currently trying to populate a full list of servers from a Lookup but that doesn't seem to be working.
Another point to mention is that the Server Names have to be joined together as the "host" field is sometimes an ip or a full dns name
... View more