I use bluepill to track the health of service process (like a java web service for example).
Bluepill outputs lines like:
host service1 Going from starting => up
host service1 Going from up => down
host service1 Going from down => starting
host service1 Going from up => unmonitored
The set of states is small and finite. Some are 'good' states like: 'up' some are 'bad': unmonitored, down.
What mechanisms are there to correlate and track this data?
For example:
1. I would like to display a graph that tracks services by host that transition from up => unmonitored and back often.
2. I would like to alert when a given service transitions too quickly.
3. I would like to see if all services transition to a bad state at the same time.
I think the first step would be to extract the fields in the data. i.e.
host <service> Going from <initialstate> => <endstate>
so in props.conf
EXTRACT-bluepill = ^\S+\s(?<service>\S+)\sGoing\sfrom\s(?<initialstate>\S+)\s=>\s(?<endstate>\S+)$
Once you have the fields extracted you can build searches tracking the events that have those fields
search (initialstate=up endstate=down) OR (initialsate=down endstate=starting) | eval end_host = endstate."-".host | timechart count by end_host
Try out various searches utilizes those fields and let me know how it works out.
Also, transaction might be worth checking out : http://docs.splunk.com/Documentation/Splunk/6.3.0/SearchReference/Transaction
I considered this approach. I was hoping there was feature that was more tailor made for tracking state transitions.
It could be the case that the transaction command could help as well : http://docs.splunk.com/Documentation/Splunk/6.3.0/SearchReference/Transaction