I would like to create a report that counts the number of times I see an error log in one file with a count of events I see from another source as a percentage over time.
source="/var/log/messages" "Specific Error message" | stats count by time
and
source="/var/log/httpd/*access_log" uri_path=/function/function* |transam clientip maxspan=60m | stats count by time
Basically, it is acceptable to get a number of the errors. I would like to chart the number of errors in the first search against the number of unique ip's that are using this function as a ratio and alert if it gets too high.
You may already realize this, but it sounds like you are going to need more than one search to accomplish this. At the very least, you probably will not find a single search that will work for both sending an alert and building a nice chart (unless you only want an email with a link to the results, or an rss feed.)
It seems to me that the simplest solution would be to build a dashboard with two charts showing the same time range. This would let you very simply (and visually) see the correlation in any spikes of activity. You could do use stacked mode for your access log so you could see overall volume and repeating clientip
values at the same time.
Chart 1:
source="/var/log/messages" "Specific Error message" | timechart span=5m count
Chart 2:
source="/var/log/httpd/*access_log" uri_path=/function/function* | timechart span=5m count(clientip)
BTW: I'm not sure what your transaction command is doing for you. You are combining all the events by clientip
which is fine, except that then you go and just do a count of those. So why not do a | stats dc(clientip) by time
and skip the transaction overhead? Perhaps some part of the complexity of your search was lost when you simplified it to post it to this site.)
As far a finding a way to combine these events into a single search, that's a bit more difficult and very data specific. For example, one thing that's not clear is if your /var/log/messages
source contains a clientip field so that you can do a direct correlation with your access log events. Depending on the answer to this will change your searching approach.
If you are trying to correlate these two search purely based on _time
then you will probably want to use the bucket
command (something like | bucket _time span=5m
)
You can use the join
command to pull events from multiple searches. Another approach would be to use the append
command to simply slap two separate searches together then use a stats
or a transaction
command to pull the events together based on field(s)
Yeah, join
can be a pain to get working properly. You will probably want something like: <search1> | join type=outer max=0 overwrite=false clientip [ search <search2> ]
Thanks.
I do have "clientip" that I can use to correlate between the two sources.
I was using "transam" to group the logs together to combine it all into a session. The stats dc(clientip) is a much better idea. I want to count the number of sessions and the number of errors and alert if that ratio gets too high.
I am trying to use "join" as you suggest using the clientip, but don't really see how to make it fit.
I do have the 2 different searches working as a dashboard, but I don't have any kind of alert if the ratio gets too high.