Splunk Search

Create alert that checks every 30 mins if the attempt to FTP file to another host fails 3 times in a row?

damonmanni
Path Finder

Goal:
If "[FATAL]" FTP message to same destination host "host-xyz" is found 3 times within 1 minute, then trigger alert to send email to admin.

Alert results:
Should be grouped by time showing which hostnames failed within a 1 minute period.

Time               host 
TIMESTAMP          host-xyz                    
TIMESTAMP          host-albert
                   host-jimbob
TIMESTAMP          host-abc

My problem:
1) I am getting most of what i need from my query but I don't know how to organize the results to display as i describe above.
2) I don't think I am counting properly of 3 events within 1 min, as my current alert results show below.

Sample Log data events:

2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:17:56 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
...etc...

Field extract created:
I Created field called: 'failed_host' to be the hostname name found on an event (ex host-xyz)

Current query:

index=milo sourcetype=rto  FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count

Current alert results:

occurred      failed_host  _time                count
1 FTP failed  abc837       2018-03-12 08:03:00  2
2 FTP failed  abc837       2018-03-12 08:04:00  2
3 FTP failed  abc840       2018-03-19 17:17:00  2
4 FTP failed  abc840       2018-03-19 17:18:00  2
5 FTP failed  abc841       2018-03-19 17:17:00  2
6 FTP failed  abc841       2018-03-19 17:18:00  2
7 FTP failed  abc842       2018-03-12 08:03:00  2
8 FTP failed  abc842       2018-03-12 08:04:00  2 
9 FTP failed  abc844       2018-03-12 08:03:00  4
0 Karma
1 Solution

omerl
Path Finder

I would recommend you to use the transaction command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount

And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount field that counts the number of events in the transaction. You may also find the field duration interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.

I hope it helps you!
Omer

edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time

View solution in original post

0 Karma

omerl
Path Finder

I would recommend you to use the transaction command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount

And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount field that counts the number of events in the transaction. You may also find the field duration interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.

I hope it helps you!
Omer

edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time

0 Karma

damonmanni
Path Finder

Omeri,

You nailed it! My customer is very happy and so am I.

Your response time and suggestion was easy to implement and dead on. And the extra edit add at the bottom even made it better. Report looks sweet.

I really appreciate this help.
Thanks much,
Damon

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...