Splunk Search

Create alert that checks every 30 mins if the attempt to FTP file to another host fails 3 times in a row?

damonmanni
Path Finder

Goal:
If "[FATAL]" FTP message to same destination host "host-xyz" is found 3 times within 1 minute, then trigger alert to send email to admin.

Alert results:
Should be grouped by time showing which hostnames failed within a 1 minute period.

Time               host 
TIMESTAMP          host-xyz                    
TIMESTAMP          host-albert
                   host-jimbob
TIMESTAMP          host-abc

My problem:
1) I am getting most of what i need from my query but I don't know how to organize the results to display as i describe above.
2) I don't think I am counting properly of 3 events within 1 min, as my current alert results show below.

Sample Log data events:

2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
2018-Mar-19 20:18:26 [FATAL] ./ftphub_push.sh could not FTP file to host-jojo 1/1 attempt
2018-Mar-19 20:17:56 [FATAL] ./ftphub_push.sh could not FTP file to host-xyz  1/1 attempt
...etc...

Field extract created:
I Created field called: 'failed_host' to be the hostname name found on an event (ex host-xyz)

Current query:

index=milo sourcetype=rto  FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count

Current alert results:

occurred      failed_host  _time                count
1 FTP failed  abc837       2018-03-12 08:03:00  2
2 FTP failed  abc837       2018-03-12 08:04:00  2
3 FTP failed  abc840       2018-03-19 17:17:00  2
4 FTP failed  abc840       2018-03-19 17:18:00  2
5 FTP failed  abc841       2018-03-19 17:17:00  2
6 FTP failed  abc841       2018-03-19 17:18:00  2
7 FTP failed  abc842       2018-03-12 08:03:00  2
8 FTP failed  abc842       2018-03-12 08:04:00  2 
9 FTP failed  abc844       2018-03-12 08:03:00  4
0 Karma
1 Solution

omerl
Path Finder

I would recommend you to use the transaction command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount

And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount field that counts the number of events in the transaction. You may also find the field duration interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.

I hope it helps you!
Omer

edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time

View solution in original post

0 Karma

omerl
Path Finder

I would recommend you to use the transaction command, as it seems to do exactly what you need.
So I would change this query:
index=milo sourcetype=rto FATAL earliest=-30m@d latest=now | bucket _time span=1m | stats count by failed_host _time | eval occurred=if(count!=3,"FTP failed", null()) | where isnotnull(occurred) | table occurred failed_host _time count
to something more like:
index=milo sourcetype=rto FATAL earliest=-30m@m
| transaction failed_host maxspan=1m
| search eventcount >= 3
| table failed_host _time eventcount

And now splunk will look for transaction of the same failing host within 1 minute (=maxspan), and connect them to one event, which includes the eventcount field that counts the number of events in the transaction. You may also find the field duration interesting (I excluded it in the query), since it tells you exactly what was the duration of the transaction.

I hope it helps you!
Omer

edit:
To organize the results as groups of time I would add this to the end of my query:
| bin _time span=1m | stats list(*) as * by _time

0 Karma

damonmanni
Path Finder

Omeri,

You nailed it! My customer is very happy and so am I.

Your response time and suggestion was easy to implement and dead on. And the extra edit add at the bottom even made it better. Report looks sweet.

I really appreciate this help.
Thanks much,
Damon

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...