All Apps and Add-ons

Splunk App for NetApp Data ONTAP: Why am I receiving "read operation timed out" errors when collecting filers over a WAN connection?

lee_melvin
Path Finder

I'm getting read timeout errors for NetApp data collection using the Splunk App for NetApp Data ONTAP, when collecting from filers over a WAN connection. In the "Connection Problems in the Past Hour" report, I get messages like this:

2016-09-22 07:36:41,912 ERROR [ta_ontap_collection_worker://zeta:4535] [QuotaHandler] Problem collecting ontap:quota data from server=somefiler.somesite.mentorg.com : ('The read operation timed out',) 
Traceback (most recent call last): 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/handlers.py", line 129, in run results = qa.run() 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventory.py", line 106, in run return self.run7Mode() 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventoryQuota.py", line 20, in run7Mode self.aggregate_results(data, 'quota-report-iter-start', secondLevelDictArray) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventory.py", line 74, in aggregate_results for x in gen: 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapClient.py", line 521, in query7ModeGen response = naElementToDict(self.queryApi(api, OntapClient.projectResponse)) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapClient.py", line 418, in queryApi response = self.connection.invoke_elem(api) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/NetApp/NaServer.py", line 483, in invoke_elem xml_response = response.read() 
File "/opt/splunk/lib/python2.7/httplib.py", line 593, in read s = self.fp.read() 
File "/opt/splunk/lib/python2.7/socket.py", line 355, in read data = self._sock.recv(rbufsize) 
File "/opt/splunk/lib/python2.7/ssl.py", line 734, in recv return self.read(buflen) 
File "/opt/splunk/lib/python2.7/ssl.py", line 621, in read v = self._sslobj.read(len or 1024) SSLError: ('The read operation timed out',)

The problem happens with filers at multiple sites, but is only occasional - mostly the data is collected as expected. WAN saturation does not appear to be an issue.

The lowest level of (apparent) SSL configurable timeout I could find is in Splunk_TA_ontap/bin/ta_ontap/NetApp/OntapClient.py, it has a CONNECTION_TIMEOUT=30 setting. I have changed that to 60 and will see if it alleviates the issue.

We have filers at 30+ sites, but the DCNs are all in our primary datacenter. Would it be better to have local DCNs and not collect data over the WAN? If I create a bunch of site-local DCNs, will the scheduler automatically know what DCN is "closest" to a filer (or can that be manually configured)?

Thanks!

Lee

0 Karma

lee_melvin
Path Finder

Changing the timeout from 30s to 60s dropped my read timeout error rate from ~25/hr to about 3/hr. I'm not sure if I'm solving a problem (WAN data collection just needs more time) or masking some other issue that needs solving, but it reduces my gaps in data collection. I've set a standard of CONNECTION_TIMEOUT = 120 for all existing and future DCNs in our environment.

0 Karma

cmeerbeek
Path Finder

We have exactly the same error and have created a support case for this. I will try to do as you suggested and will see how this plays out.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...