Hi All,
I am a fresher to Splunk and I am trying to create a graph which has the top 10 error messages in each host. Can you please let me know how to do this, I have tried this, but this is not working..
sourcetype=websphere ERROR | stats count by EventCode, host | sort limit=0 host, - count
| streamstats count as top by host
| where top <= 10
| stats list(EventCode) as EventCode, list(count) as count by host | timechart max(EventCode)
can you please help me with this
I think that you'll like this; most people do not know it but top
can use a BY
clause, too:
sourcetype=websphere ERROR | top 10 EventCode BY host
@woodcock - yes, I'm always dubious when doing | sort (non-zero-number)
but can never pull the right head
/ sort
/ top
syntax in the heat of the aircode.
There are a number of structural issues with the request.
A) The top 10 for each host might be a different ten on each day or for each host, leading to no limit on the number of overall codes that would have to be tracked.
B) You can't do a timechart
without a time, and timechart is going to present the data spread over time, so the vertical (the colored line or bar) would have to represent both the host and the error code, in which case you have hundreds of lines and no decent way to visually interpret the results.
Here are a couple of examples of what you CAN do, given what you have...
Use this to get, by host, a count of the top 10 error messages overall
sourcetype=websphere ERROR
| stats count as EventCount by EventCode, host
| rename COMMENT as "Determine the top 10 EventCodes overall and mark them, killing everything else"
| appendpipe
[| stats sum(EventCount) as EventTotal by EventCode
| sort 10 - EventTotal
]
| eventstats max(EventTotal) as EventTotal by EventCode
| where isnotnull(EventTotal) AND isnotnull(EventCount)
| fields - EventTotal
| chart sum(EventCount) as count by host EventCode
Use this to get, across time, by host, a count of the top 2 error messages overall
sourcetype=websphere ERROR
| bin _time span=1d
| stats count as EventCount by EventCode, host, _time
| rename COMMENT as "Determine the top two EventCodes of all time and mark them, killing everything else"
| appendpipe
[| stats sum(EventCount) as EventTotal by EventCode
| sort 2 - EventTotal
]
| eventstats max(EventTotal) as EventTotal by EventCode
| where isnotnull(EventTotal) AND isnotnull(EventCount)
| fields - EventTotal
| rename COMMENT as "Determine the top ten hosts with those EventCodes and mark them, killing everything else"
| appendpipe
[| stats sum(EventCount) as EventTotal by host
| sort 10 - EventTotal
]
| eventstats max(EventTotal) as EventTotal by host
| where isnotnull(EventTotal) AND isnotnull(EventCount)
| fields - EventTotal
| rename COMMENT as "Combine host and EventCode to make a single field named series"
| eval series = host." Error ".EventCode
| timechart sum(EventCount) as count by series
Are you looking for error messages (text) or error codes (integers)? You current query looks like it's looking for the latter.
I am looking for error codes in text; ex: Error reported: 503