Alerting

Why are my real-time alerting searches no longer sending emails for matching events after upgrading to 6.1?

hexx
Splunk Employee
Splunk Employee

Right after upgrading to 6.1, I noticed that some scheduled real-time searches fail to send emails or trigger any otherwise configured alert actions after they have been running for a while.

Alerts are sent initially, but after a few hours, even though events come in that match the search, there are no alerts triggered.

Is this a known bug?

Tags (4)
1 Solution

hexx
Splunk Employee
Splunk Employee

This is bug SPL-84357 and is specific to 6.1 and 6.1.1.

1 - Known symptoms

There are two known signatures to this bug:

  • In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1 to a local requester (client IP = 127.0.0.1), which actually is the search process:


    127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms

    In this example, splunkd is denying the search process running the "test per-result alerting" search the creation of an alert item. These messages will appear every time a matching event is found, but the alert cannot be created.

  • In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:


    05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12

    These messages will appear every time a matching event is found, but the alert cannot be created.

2 - Root cause

The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf:


general
sessionTimeout=1h

This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.

3 - Work-around

The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:


[general]
sessionTimeout = 30d

4 - Resolution

This issue is slated to be fixed in our next maintenance release: 6.1.2

View solution in original post

hexx
Splunk Employee
Splunk Employee

This is bug SPL-84357 and is specific to 6.1 and 6.1.1.

1 - Known symptoms

There are two known signatures to this bug:

  • In splunkd_access.log, we can see entries recorded for the failing alerting searches that show splunkd denying access (401) for a POST to the .../saved/searches/{search name}/notify?trigger.condition_state=1 to a local requester (client IP = 127.0.0.1), which actually is the search process:


    127.0.0.1 - - [20/May/2014:12:43:16.856 -0700] "POST /servicesNS/admin/search/saved/searches/test%20per-result%20alerting/notify?trigger.condition_state=1 HTTP/1.0" 401 148 - - - 0ms

    In this example, splunkd is denying the search process running the "test per-result alerting" search the creation of an alert item. These messages will appear every time a matching event is found, but the alert cannot be created.

  • In the affected search's search.log, we can see a corresponding client-side message, reporting the authorization denial from splunkd to create an alert item:


    05-20-2014 12:43:16.857 WARN SearchStateListener - Search listener notification returned non 2XX status code, status_code=401; Success. Removing fake artifacts, sid=rt_scheduler_adminsearch_RMD58061d21d6537baaa_at_1400533557_0.12

    These messages will appear every time a matching event is found, but the alert cannot be created.

2 - Root cause

The root cause is that the process of the real-time alerting search is using an authentication token to communicate back to splunkd which is inappropriately subjected to the splunkd session timeout configured in server.conf, with a default value of 1 hour. From $SPLUNK_HOME/etc/system/default/server.conf:


general
sessionTimeout=1h

This timeout is pushed back every time the search process talks back to splunkd, which is why this issue will not occur for searches that match events frequently. However, if matching events come in more than 1 hour apart, the token will expire and the error will occur.

3 - Work-around

The work-around is to temporarily extend "sessionTimeout" in $SPLUNK_HOME/etc/system/local/server.conf to a value that will be longer than the interval between two matched events, thus preventing the token from expiring:


[general]
sessionTimeout = 30d

4 - Resolution

This issue is slated to be fixed in our next maintenance release: 6.1.2

vqd361
Path Finder
0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...