This would go in to Big data Analyzes.
I have a huge load of events coming from our network infrastructure.
When I look at one host and fan failure, I do see that it send syslog message every 10 minute.
Same for port-security message, every 5 minutes.
How to go trough the huge log, example last 4 hour and then see what message repeats all the time at equal time span the last 4 hour?
Look at this example data:
Time Event
09:01
09:02 Event_A
09:03 Event_B
09:04 Event_C
09:05 Event_D
09:06
09:07 Event_A
09:08
09:09 Event_B
09:10 Event_F
09:11
09:12 Event_A
09:13 Event_B
09:14 Event_C
09:15 Event_D
09:16
09:17 Event_A
09:18 Event_D
09:19
09:20 Event_F
09:21 Event_B
09:22 Event_A
09:23
09:24 Event_C
09:25
09:26 Event_D
09:27 Event_A
09:28
09:29
09:30 Event_B
09:31
09:32 Event_A
09:33
09:34 Event_C
09:35 Event_F
From these data you can see the following time span between each equal event type:
Event Type Diff in min
Event_A
Event_A 5
Event_A 5
Event_A 5
Event_A 5
Event_A 5
Event_A 5
Event_B
Event_B 6
Event_B 4
Event_B 8
Event_B 9
Event_C
Event_C 10
Event_C 10
Event_C 10
Event_D
Event_D 10
Event_D 3
Event_D 8
Event_F
Event_F 10
Event_F 15
From this I can see that my interesting data is Event_A
and Event_C
, since the occurs trough the whole time span at fixed time.
I have looked at cluster
, transaction
and more, but do not see how to get the time stored and calculated.
Time may not be exactly repeated on the same micro seconds, but I guess it can be rounded down to nearest second.
Try this which will do following:
yourBaseSearch to return the events
| rex "(?<eventType>[A-Za-z_]+)"
| reverse
| sort eventType
|autoregress _time as newTime p=1
|autoregress eventType as newEventType p=1
| eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
| table eventType, _time , timeDiff, newEventType
Updated query that gives the eventTypes which only have single timeDifference value.
yourBaseSearch to return the events
| rex "(?[A-Za-z_]+)"
| reverse
| sort eventType
|autoregress _time as newTime p=1
|autoregress eventType as newEventType p=1
| eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
| table eventType, _time , timeDiff, newEventType
| where timeDiff > 0
| transaction eventType
| table eventType, timeDiff | mvexpand timeDiff | stats count by eventType | where count=1
This gives me all the events and the time between them. I like to see the if some of the events repeats with equal time all the time. This to fine the background noise so I can fix them and reduce the load on Splunk license.
ok, if you want to find the events that have only a single time difference all the time then modify the search as follows, updating the same in answer as well:
yourBaseSearch to return the events
| rex "(?[A-Za-z_]+)"
| reverse
| sort eventType
|autoregress _time as newTime p=1
|autoregress eventType as newEventType p=1
| eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
| table eventType, _time , timeDiff, newEventType
| where timeDiff > 0
| transaction eventType
| table eventType, timeDiff | mvexpand timeDiff | stats count by eventType | where count=1
It looks like this just count how many hit each eventType
found.
Ignoring the fact that timeDiff
should be within same distanse to each other. And if there are more than 1 evntType
where count=1
, it just ignore to display it. So at the final, it does not show noting.
No, that is not true. It is same command which was showing you eventtypes with their timeDiffs and all I did was appended the logic of first transaction on the eventTypes so that all the timeDiffs come as a multivalue field and from here pick only those eventtypes which have only single timeDiff. Try to remove everything after |where timeDiff>0 and you will see that this commands does bring you all the eventTypes with their timeDiffs and all it then does is chooses the ones which only have timeDiffs which are count=1
Not sure what your data actually looks like, but can you sort by event then time (or host, event then time)? And then use delta to get the diff?
... | sort Event _time | delta _time AS diff_in_sec p=1
I will up vote this, since it gave me the correct start point.
So did you get it work, then?
It works, but as I wrote it may need some more tweak to give perfect data. Eks I like to see what repeats all the time, not some that repeats 10 times within one hour, but in all hour of my window. This is to find the things that makes noise all the time at fixed interval.
so for a given event, you'd expect the diff between each to be consistent. And you'd also expect that number of events would be about the same as the window / diff? For example, it would be about 5 minutes between each EventA and therefore you'd expect 48 EventA's over a 4 hour window (240 minutes / 5 minutes repeat interval).
Not sure the best way to do that, but feels like you could get close with with a few more stats'ish command.
This is 100% correct, If you have a port-security error. (to many mac on one port), it will have a recovery timer, eks 5 min. It will try to enable the port, then fail again. Same with fan failure, it will have a fixed time between all events.
Try this
base search | reverse | delta _time as diff | stats count values(eventtype) as events by diff | mvexpand events | sort - diff
This works fine if its only one type of events. When its more events, it seems to calculate the diff between any event and not equal events. diff here is calculated between any events.
supposing event A is port security and event C is fan failure, the information contained must be different for sure right? just create a eventtype for each kind of event and use it in you base search. This way you select the specific kind of events you want to calculate de time difference.
Problem is that I do not know the event up front. Here are some example on events that repeats on our system:
%SEC-6-IPACCESSLOGRL:
%PM-4-ERR
%SEC_LOGIN-4-LOGIN_FAILED
%VMS-3-CONN_CONNECT_ERR
%SEC-6-IPACCESSLOGNP
%SFF8472-5-THRESHOLD_VIOLATION:
%PLATFORM-1-CRASHED:
%PFMA-2-PS_FAIL:
%OSPF-4-NET_TYPE_MISMATCH:
%ENVMON-4-FAN_LOW_RPM
%CDP-5-NEIGHBOR_ADDED
%PIM-6-INVALID_RP_JOIN:
%CDP-4-NATIVE_VLAN_MISMATCH:
%PORT_SECURITY-2-PSECURE_VIOLATION
%C4K_EBM-4-HOSTFLAPPING
%C4K_HWFLOWMAN-4-NFEFLINTERRUPT
I will accept this, since it gave me the most of my solution.
msg=event and I did also include host to get data sorted by host.
Its far from perfect. It does not take in count that event should be on the whole time span.
my search
| eval info=host."-".msg
| sort info _time
| delta _time AS diff
| eval diff=round(diff/3)*3
| stats count values(info) as events by diff
| mvexpand info
| where count>4 AND diff>0
| rex field=events "^(?<host>[^-]+)-(?<msg>\S+)$"
| table diff count host msg
| sort - count
@lakromani try the query I put in as answer, it will give u all that u need per event rather than working for only one event.