Splunk Search

How to find repeated events that have a fixed time pattern?

lakromani
Builder

This would go in to Big data Analyzes.

I have a huge load of events coming from our network infrastructure.

When I look at one host and fan failure, I do see that it send syslog message every 10 minute.
Same for port-security message, every 5 minutes.

How to go trough the huge log, example last 4 hour and then see what message repeats all the time at equal time span the last 4 hour?

Look at this example data:

Time    Event
09:01   
09:02   Event_A
09:03   Event_B
09:04   Event_C
09:05   Event_D
09:06   
09:07   Event_A
09:08   
09:09   Event_B
09:10   Event_F
09:11   
09:12   Event_A
09:13   Event_B
09:14   Event_C
09:15   Event_D
09:16   
09:17   Event_A
09:18   Event_D
09:19   
09:20   Event_F
09:21   Event_B
09:22   Event_A
09:23   
09:24   Event_C
09:25   
09:26   Event_D
09:27   Event_A
09:28   
09:29   
09:30   Event_B
09:31   
09:32   Event_A
09:33   
09:34   Event_C
09:35   Event_F

From these data you can see the following time span between each equal event type:

Event Type  Diff in min
Event_A 
Event_A 5
Event_A 5
Event_A 5
Event_A 5
Event_A 5
Event_A 5


Event_B 
Event_B 6
Event_B 4
Event_B 8
Event_B 9

Event_C 
Event_C 10
Event_C 10
Event_C 10

Event_D 
Event_D 10
Event_D 3
Event_D 8

Event_F 
Event_F 10
Event_F 15

From this I can see that my interesting data is Event_A and Event_C, since the occurs trough the whole time span at fixed time.
I have looked at cluster, transaction and more, but do not see how to get the time stored and calculated.
Time may not be exactly repeated on the same micro seconds, but I guess it can be rounded down to nearest second.

0 Karma

gokadroid
Motivator

Try this which will do following:

yourBaseSearch to return the events
| rex "(?<eventType>[A-Za-z_]+)"
| reverse
| sort eventType
|autoregress _time as newTime p=1
|autoregress eventType as newEventType p=1
| eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
| table eventType, _time , timeDiff, newEventType
  1. rex the eventType out as a field if not already done. If already extracted remove the rex command portion and use yourField in place of "eventType" in all the places.
  2. Get all your events in reverse chronological order using reverse
  3. Sort events on field eventType so all the Event_As are one after the other followed by all of Event_B and so on.
  4. autoregress to add the interested data from prev event (which is time for us) to current event. So _time of prev event is added to current event as newTime. p=1 which is also by default is ensuring to go back only one event.
  5. autoregress on the eventType as well as we want to know when do our events Switch from Event_A to Event_B.
  6. eval the difference of current event time and prev event time based on whether event switched or stayed same. Since the index time (assuming goes only upto seconds HH:MM:SS) I did a (/60) to calculate the seconds. If its different for you, do the appropriate division to get difference the seconds.
  7. Tabulate the eventType, the actual time of event and time seconds difference between previous and current event.

Updated query that gives the eventTypes which only have single timeDifference value.

yourBaseSearch to return the events
 | rex "(?[A-Za-z_]+)"
 | reverse
 | sort eventType
 |autoregress _time as newTime p=1
 |autoregress eventType as newEventType p=1
 | eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
 | table eventType, _time , timeDiff, newEventType
| where timeDiff > 0
| transaction eventType
| table eventType, timeDiff | mvexpand timeDiff | stats count by eventType | where count=1
0 Karma

lakromani
Builder

This gives me all the events and the time between them. I like to see the if some of the events repeats with equal time all the time. This to fine the background noise so I can fix them and reduce the load on Splunk license.

0 Karma

gokadroid
Motivator

ok, if you want to find the events that have only a single time difference all the time then modify the search as follows, updating the same in answer as well:

yourBaseSearch to return the events
 | rex "(?[A-Za-z_]+)"
 | reverse
 | sort eventType
 |autoregress _time as newTime p=1
 |autoregress eventType as newEventType p=1
 | eval timeDiff=if(eventType=newEventType,(_time - newTime)/60, 0)
 | table eventType, _time , timeDiff, newEventType
| where timeDiff > 0
| transaction eventType
| table eventType, timeDiff | mvexpand timeDiff | stats count by eventType | where count=1
0 Karma

lakromani
Builder

It looks like this just count how many hit each eventType found.
Ignoring the fact that timeDiff should be within same distanse to each other. And if there are more than 1 evntType where count=1, it just ignore to display it. So at the final, it does not show noting.

0 Karma

gokadroid
Motivator

No, that is not true. It is same command which was showing you eventtypes with their timeDiffs and all I did was appended the logic of first transaction on the eventTypes so that all the timeDiffs come as a multivalue field and from here pick only those eventtypes which have only single timeDiff. Try to remove everything after |where timeDiff>0 and you will see that this commands does bring you all the eventTypes with their timeDiffs and all it then does is chooses the ones which only have timeDiffs which are count=1

0 Karma

maciep
Champion

Not sure what your data actually looks like, but can you sort by event then time (or host, event then time)? And then use delta to get the diff?

... | sort Event _time | delta _time AS diff_in_sec p=1

lakromani
Builder

I will up vote this, since it gave me the correct start point.

0 Karma

maciep
Champion

So did you get it work, then?

0 Karma

lakromani
Builder

It works, but as I wrote it may need some more tweak to give perfect data. Eks I like to see what repeats all the time, not some that repeats 10 times within one hour, but in all hour of my window. This is to find the things that makes noise all the time at fixed interval.

0 Karma

maciep
Champion

so for a given event, you'd expect the diff between each to be consistent. And you'd also expect that number of events would be about the same as the window / diff? For example, it would be about 5 minutes between each EventA and therefore you'd expect 48 EventA's over a 4 hour window (240 minutes / 5 minutes repeat interval).

Not sure the best way to do that, but feels like you could get close with with a few more stats'ish command.

0 Karma

lakromani
Builder

This is 100% correct, If you have a port-security error. (to many mac on one port), it will have a recovery timer, eks 5 min. It will try to enable the port, then fail again. Same with fan failure, it will have a fixed time between all events.

0 Karma

sundareshr
Legend

Try this

base search | reverse | delta _time as diff | stats count values(eventtype) as events by diff | mvexpand events | sort - diff

lakromani
Builder

This works fine if its only one type of events. When its more events, it seems to calculate the diff between any event and not equal events. diff here is calculated between any events.

0 Karma

diogofgm
SplunkTrust
SplunkTrust

supposing event A is port security and event C is fan failure, the information contained must be different for sure right? just create a eventtype for each kind of event and use it in you base search. This way you select the specific kind of events you want to calculate de time difference.

------------
Hope I was able to help you. If so, some karma would be appreciated.
0 Karma

lakromani
Builder

Problem is that I do not know the event up front. Here are some example on events that repeats on our system:

%SEC-6-IPACCESSLOGRL:
%PM-4-ERR
%SEC_LOGIN-4-LOGIN_FAILED
%VMS-3-CONN_CONNECT_ERR
%SEC-6-IPACCESSLOGNP
%SFF8472-5-THRESHOLD_VIOLATION:
%PLATFORM-1-CRASHED:
%PFMA-2-PS_FAIL:
%OSPF-4-NET_TYPE_MISMATCH:
%ENVMON-4-FAN_LOW_RPM
%CDP-5-NEIGHBOR_ADDED
 %PIM-6-INVALID_RP_JOIN:
%CDP-4-NATIVE_VLAN_MISMATCH:
%PORT_SECURITY-2-PSECURE_VIOLATION
%C4K_EBM-4-HOSTFLAPPING
%C4K_HWFLOWMAN-4-NFEFLINTERRUPT
0 Karma

lakromani
Builder

I will accept this, since it gave me the most of my solution.
msg=event and I did also include host to get data sorted by host.
Its far from perfect. It does not take in count that event should be on the whole time span.

my search
| eval info=host."-".msg 
| sort info _time 
| delta _time AS diff 
| eval diff=round(diff/3)*3 
| stats count values(info) as events by diff 
| mvexpand info 
| where count>4 AND diff>0 
| rex field=events "^(?<host>[^-]+)-(?<msg>\S+)$" 
| table diff count host msg 
| sort - count
0 Karma

gokadroid
Motivator

@lakromani try the query I put in as answer, it will give u all that u need per event rather than working for only one event.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...