All Apps and Add-ons

intermittent events missing when pulling through Microsoft Log Analytics Add-on (Formerly Known as OMS)

anwar114
Explorer

Hi,

intermittent events missing when pulling through Microsoft Log Analytics Add-on (Formerly Known as OMS) .
can not find any err or warn in the internal logs.
When tried to pull with larger Event Delay / Lag Time it pulled all the events.
so its working but when changed to 15 min it again has this intermittent event loss.
interval : 60

Also is there a plan for python 3 support , eventually splunk 8 would go for python 3.

0 Karma

jkat54
SplunkTrust
SplunkTrust

When a Log Analytics input runs, it will pull data from the Log Analytics API. To pull the data, we must specify two timestamps to search between for events. Therefore the data pull requires a start_date and end_date. All timestamps used are specified in UTC.

IMPORTANT: Due to lag in writing events to Log Analytics (explained here: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time), we would have missing data if we always pulled the latest X minutes of data from the API. Therefore this app was developed with a setting called "event_lag".

Event_lag is in seconds and the value is used to force the API query to be set back in time by the amount of seconds that are specified. That is to say, if you set an event_lag of 60 seconds, the input will always look for data that is at least 60 seconds old. In other words, we subtract event_lag from the end_date used in the query in order to offset the data collection by the amount of event_lag specified. It's also important to note that event_lag is never subtracted from the start_date

The next time/date field to consider is the interval of the collection. If you have an event_lag of 30 minutes and interval of 1 minute, you will duplicate 29 minutes of data every execution. It is therefore recommended that your event_lag equals your interval OR if you're highly suspicious of possible data loss, you might prefer to have some duplication in favor of the possibility of missing events due to the Log Analytics event lag described here (https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time).

The final time/date field to consider is the checkpoint. The checkpoint timestamp is also in UTC, and it is equal the latest end_date sent to the API.

You can think of these timestamps as described below:

start_date = earliest time to pull data from the log analytics API (aka start time/date)
end_date = Time of execution MINUS the lag time specified (with microseconds removed)
event_lag = Amount of time in seconds to always subtract from end_date before submiting the query to the API
interval = How ofen the input tries to pull data
checkpoint = after successfully completing 1 run, will be set to the latest end_date used by the input, and will be used next run as the start_date

Considering the above, the following logic is true of every execution for each input you've defined:

If, the input has not run at least one time before:

checkpoint will be empty at begining of the first time run
start_date will be equal to the start_date specified on the input

-OR- if start_date is not specified on the input, will default to Jan 1st 1970

end_date will be equal to the current time in UTC minus the event_lag
event_lag & interval will be what you set on the input

Else if, the input has run at least one time before:

checkpoint will be equal to end_date from the previous run.
start_date will be equal to the checkpoint, which should be equal to (UTC timestamp of last run minus the event_lag)

-OR- if kvstore has failed -OR- if the checkpoint has been removed from kvstore:

start_date will be equal to the start_date specified on the input

-OR- if start_date is not specified on the input,

start_date will default to Jan 1st 1970

end_date will be equal to the current time in UTC minus the event_lag
event_lag & interval will be what you set on the input

Finally, when the code executes it pulls the data from start_date to end_date.

anwar114
Explorer

Its bit strange behaviour. When i disable and enable the input , it pulls all the events from the eventhub. all works well.
thereafter for regular interval pulls it skips some events and pulls some (cannot figure out whats going wrong for certain events to be skipped and certain pulled during regular pulls.)

0 Karma

jkat54
SplunkTrust
SplunkTrust

The log analytics app doesn't pull data from event hubs. It pulls from the log analytics API.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Event lag and interval should be the same in most cases unless youd rather have the possibility to duplicate data vs possibly missing data

In which case your interval should be less than your lag.

0 Karma

anwar114
Explorer

This is my settings now:
interval : 840 (14min)
lag time : 15

My initial was as below which was working till 2 weeks before, then i changed to above both are not working.
interval : 60
lag time : 15

0 Karma

jkat54
SplunkTrust
SplunkTrust

Try interval 15minutes lag 30 minutes.

0 Karma

anwar114
Explorer

did the change looks like events are appearing now. will monitor .
Thanks for quick response and suggestions

0 Karma

anwar114
Explorer

appreciate if @jkat54 have a look.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...