All Apps and Add-ons

Splunk Add-on for Microsoft Cloud Services: Storage Table Indexing Delay

kmanson
Path Finder

Trying to figure out the delay in reading from an Azure Storage table. According to the logs, the event happened at 14:24, 5mins later it wrote to storage table, 75mins later Splunk indexed the event.

Example Event
_time: 2016-10-24 14:24:58.395 (_time taken from EventTime)
_indextime: 2016-10-24 15:44:34 (Indexed Time)
Timestamp: 2016-10-24T14:29:35.0328389Z (Time event wrote to storage table)
EventTime: 2016-10-24T14:24:58.395405Z (Time event occured)

inputs.conf
[mscs_storage_table://LinuxsyslogVer2v0]
account = REDACTED
collection_interval = 120
index = os
sourcetype = mscs:storage:linux
start_time = 2016-10-13T00:00:00+00:00

table_list = LinuxsyslogVer2v0

These scheduler.py happen every 120 seconds, until 16:18:44 when the file=mscs_storage_table_data_collector.py runs again.

2016-10-24 16:18:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-126, file=mscs_storage_table_data_collector.py, func_name=do_collect_data, code_line_no=79 | [stanza_name="LinuxsyslogVer2v0" account_name="jetprdeast2audit" table_name="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T14:29:44Z') and (Timestamp lt datetime'2016-10-24T16:17:44Z')".
2016-10-24 14:40:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:40:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999464, and there are 1 jobs scheduling
2016-10-24 14:38:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-6, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:38:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999281, and there are 1 jobs scheduling
2016-10-24 14:36:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:36:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999461, and there are 1 jobs scheduling
2016-10-24 14:34:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:34:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999312, and there are 1 jobs scheduling
2016-10-24 14:32:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-6, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:32:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.998499, and there are 1 jobs scheduling
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=84 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] query_entities_page_size=1000 event_cnt_per_item=100 query_end_time_offset=60.0
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=79 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The filter is "(Timestamp ge datetime'2016-10-24T12:41:44Z') and (Timestamp lt datetime'2016-10-24T14:29:44Z')".
2016-10-24 14:30:44,216 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_service_factory.py, func_name=_set_proxy, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Finishing pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_pre_process_ckpt, code_line_no=152 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The entities between 2016-10-24T10:51:44Z and 2016-10-24T12:41:44Z are all collected, updating the checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_pre_process_ckpt, code_line_no=145 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] The checkpoint status=all_done.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=_do_collect_data, code_line_no=65 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Starting to pre-process checkpoint.
2016-10-24 14:30:44,215 +0000 log_level=INFO, pid=60713, tid=Thread-124, file=mscs_storage_table_data_collector.py, func_name=collect_data, code_line_no=54 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_dispatcher.py, func_name=_dispatch_tasks, code_line_no=122 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The number of qualified_storage is 1
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_get_storage_info_list, code_line_no=63 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The qualified table_list=[u'LinuxsyslogVer2v0']
2016-10-24 14:30:44,214 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_get_storage_info_list, code_line_no=57 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The table_list=[u'LinuxCpuVer2v0', u'LinuxDiskVer2v0', u'LinuxMemoryVer2v0', u'LinuxsyslogVer2v0', u'MemoryVer2v0', u'SchemasTable', u'WADDiagnosticInfrastructureLogsTable', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADMetrics', u'WADPerformanceCountersTable', u'WADWindowsEventLogsTable']
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_service_factory.py, func_name=_set_proxy, code_line_no=68 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] proxy is disabled.
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_table_dispatcher.py, func_name=_log_compiled_table_names, code_line_no=42 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] The patterns=[u'LinuxsyslogVer2v0$']
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-123, file=mscs_storage_dispatcher.py, func_name=_dispatch_storage_list, code_line_no=85 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to dispatch storage list.
2016-10-24 14:30:44,180 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=31 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to get data from data_writer.
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=27 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Starting to collect data.
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=mscs_storage_dispatcher.py, func_name=
init, code_line_no=40 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] worker_threads_num=10
2016-10-24 14:30:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-5, file=ta_data_collector.py, func_name=index_data, code_line_no=122 | [stanza_name="LinuxsyslogVer2v0" account="acctname"] Start indexing data for checkpoint_key=LinuxsyslogVer2v0
acctname
2016-10-24 14:30:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999494, and there are 1 jobs scheduling
2016-10-24 14:29:52,377 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:29:52,377 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=ta_data_collector.py, func_name=index_data, code_line_no=130 | [stanza_name="LinuxsyslogVer2v0" account="acctname"] End of indexing data for checkpoint_key=LinuxsyslogVer2v0
_acctname
2016-10-24 14:29:52,376 +0000 log_level=INFO, pid=60713, tid=Thread-4, file=mscs_storage_table_list_data_collector.py, func_name=collect_data, code_line_no=61 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Retrieve the remain data from data_writer.
2016-10-24 14:29:49,375 +0000 log_level=INFO, pid=60713, tid=Thread-121, file=mscs_storage_dispatcher.py, func_name=_dispatch_storage_list, code_line_no=88 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_list="LinuxsyslogVer2v0"] Finished dispatching storage list.
2016-10-24 14:29:49,340 +0000 log_level=INFO, pid=60713, tid=Thread-122, file=mscs_storage_table_data_collector.py, func_name=collect_data, code_line_no=57 | [stanza_name="LinuxsyslogVer2v0" account_name="acctname" table_name="LinuxsyslogVer2v0"] Finish collecting data.
2016-10-24 14:28:44,179 +0000 log_level=INFO, pid=60713, tid=Thread-7, file=thread_pool.py, func_name=_run, code_line_no=261 | Thread work_queue_size=0
2016-10-24 14:28:44,178 +0000 log_level=INFO, pid=60713, tid=Thread-2, file=scheduler.py, func_name=get_ready_jobs, code_line_no=100 | Get 1 ready jobs, next duration is 119.999480, and there are 1 jobs scheduling

0 Karma

kmanson
Path Finder

Apparently this is by design, way to go Splunk!

The updated information for this issue is that the Add-On is currently working as intended. There has been an Enhancement request entered to have this addressed in a future development timeline.

0 Karma

sk314
Builder

I have a similar issue with the add-on - however I have it for o365 audit logs. Following this question - Please let us know here if you figure out the reason.

0 Karma

kmanson
Path Finder

It might not be the same issue, but I will keep everyone updated. I also have support case opened for my issue as well.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...