Hi Everyone
I have been asked to look into the possibility of having a button on the dashboard that will allow the user to pause splunks email alerting during an outage or scheduled downtime etc.
I am thinking along the lines of a dashboard button that fires a script that would use sed to edit the specified savedsearches.conf file and change the email alert field to 0, then change it back to 1 when the the button is pressed again.
I would like to know if anyone has done this and what the other options might be.
Kind Regards
Peter
This is what I ended up doing.
Configure a dashboard panel with a time dropdown and a status reading
The panel code looks like this
<panel>
<input type="dropdown" token="value1" searchWhenChanged="true">
<label>Pause Email Alerting</label>
<choice value="0">Resume</choice>
<choice value="900">15 min</choice>
<choice value="1800">30 min</choice>
<choice value="3600">1 hour</choice>
<choice value="7200">2 hours</choice>
<choice value="14400">4 hours</choice>
<choice value="28800">8 hours</choice>
<choice value="57600">16 hours</choice>
<choice value="115200">24 hours</choice>
<choice value="230400">48 hours</choice>
<default>0</default>
<initialValue>0</initialValue>
</input>
<single>
<title>Email Alerting Status</title>
<search>
<query>| inputlookup alert-pause-status.csv</query>
<earliest>-1s</earliest>
<latest>now</latest>
</search>
<option name="height">10</option>
<option name="link.visible">false</option>
<option name="refresh.time.visible">false</option>
<option name="refresh.auto.interval">5</option>
</single>
</panel>
<panel depends="$nothing1$">
<single>
<search>
<query>| alertemailoff $value1$</query>
<earliest>-1s</earliest>
<latest>now</latest>
</search>
<option name="height">10</option>
<option name="link.visible">false</option>
<option name="refresh.time.visible">false</option>
<option name="refresh.auto.interval">15</option>
</single>
</panel>
Then I configure commands.conf in the apps local dir to point at a perl script
[alertemailoff]
filename = pause-splunk-alerting.pl
type = perl
Create a file called alert-pause-status.csv in the apps lookup dir
status
active
Then create a perl script to do the editing of the savedsearches.conf file and reload the config
Please forgive my very crude perl script, if anyone wants to rewrite and post be my guest.
#!/usr/bin/perl
$file = "/opt/splunk/etc/apps/eproduct/local/savedsearches.conf";
$status = "/opt/splunk/etc/apps/eproduct/lookups/alert-pause-status.csv";
$statustas = "/opt/splunk/etc/apps/tasman/lookups/alert-pause-status.csv";
$pausefor = $ARGV[0];
$check = $$;
$pid = "/opt/splunk/etc/apps/eproduct/bin/pause-splunk-alerting.pid";
use POSIX qw( strftime );
my $finish = strftime("%a %H:%M", localtime(time + $pausefor));
if($pausefor!=0){
if(`ps -ef | egrep -v "grep|$check" | grep pause-splunk-alerting.pl`) {
print "Process Exists Exiting\n";
exit;
}else{
open(PID, "> $pid") || die "could not open '$pid' $!";
print PID "$$\n";
close(PID);
print "Starting Process\n";
}
}else{
open (IN, $pid) || die "Cannot open file ".$pid." for read";
$killpid = <IN>;
close IN;
`kill $killpid`;
print "Resume Alerting\n";
`rm $pid`;
}
open (IN, $file) || die "Cannot open file ".$file." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $file) || die "Cannot open file ".$file." for write";
foreach $line (@lines)
{
$line =~ s/action.email = 1/action.email = false/ig;
print OUT $line;
}
close OUT;
open (IN, $status) || die "Cannot open file ".$status." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $status) || die "Cannot open file ".$status." for write";
foreach $line (@lines)
{
$line =~ s/active/paused til $finish/ig;
print OUT $line;
}
close OUT;
open (IN, $statustas) || die "Cannot open file ".$statustas." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $statustas) || die "Cannot open file ".$statustas." for write";
foreach $line (@lines)
{
$line =~ s/active/paused til $finish/ig;
print OUT $line;
}
close OUT;
`/opt/splunk/bin/splunk _internal call /services/saved/searches/_reload -auth <splunk admin>:<password>`;
sleep($pausefor);
open (IN, $file) || die "Cannot open file ".$file." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $file) || die "Cannot open file ".$file." for write";
foreach $line (@lines)
{
$line =~ s/action.email = false/action.email = 1/ig;
print OUT $line;
}
close OUT;
open (IN, $status) || die "Cannot open file ".$status." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $status) || die "Cannot open file ".$status." for write";
foreach $line (@lines)
{
$line =~ s/paused.*$/active/ig;
print OUT $line;
}
close OUT;
open (IN, $statustas) || die "Cannot open file ".$statustas." for read";
@lines=<IN>;
close IN;
open (OUT, ">", $statustas) || die "Cannot open file ".$statustas." for write";
foreach $line (@lines)
{
$line =~ s/paused.*$/active/ig;
print OUT $line;
}
close OUT;
`/opt/splunk/bin/splunk _internal call /services/saved/searches/_reload -auth <splunk admin>:<password>`;
`rm $pid`;
Now when you select a time from the dropdown it runs the perl script, changes the savedsearches.conf "action.email = 1" to "action.email = false" then changed the status file from active to "paused til ..." reloads the config then sleeps for the desired time and puts everything back again.
You can also select Resume from the dropdown to put it all back and resume email alerts.
Crude but effective, any improvements are welcome.
Cheers
Nice !
I removed your admin password from the script 😉
Yeh those credentials would be hit and miss for sure lol!
Hi proylea,
You can run a REST POST against this endpoint:
/servicesNS/nobody/search/alerts/alert_actions/email
and change it to some non existing value like @woodcock mentioned. So a working example would be:
curl -k -u admin:changeme https://localhost:8089/servicesNS/nobody/search/alerts/alert_actions/email -d mailserver=foobar
This can be also used in a custom search command or an external script; as always it depends on your use case 😉
Hope this helps ...
cheers, MuS
Yes that would be ok but I don't want to disable all email alerts only the specific ones related to the apps.
I have a working solution now it's crude but it works.
Most of the time, I create a lookup table which holds server names
and alert_status
like:
server, alert_status
foo,active
boo,down
and use a automatic lookup. All the alerts are saved and include the field alert_status="active"
in the base search. This way you can set only some servers in maintenance mode
while others are still active.
cheers, MuS
Yeh I use similar lookups to do alerting ack and clear type functionality.
The easiest way is to misconfigure (break) the System Settings
-> Email Alert Settings
-> Mail Server Settings
. For example, change Mail Host
to something like ThisHostIsDeliberatelyBrokenUntilOutageIsFixed-ChangeItBackToASAP
.
Cool idea!
Actually most monitoring and alerting systems have it as standard functionality, perhaps Splunk could consider an update