All Apps and Add-ons

Splunk App for NetApp Data ONTAP: Why am I receiving "read operation timed out" errors when collecting filers over a WAN connection?

lee_melvin
Path Finder

I'm getting read timeout errors for NetApp data collection using the Splunk App for NetApp Data ONTAP, when collecting from filers over a WAN connection. In the "Connection Problems in the Past Hour" report, I get messages like this:

2016-09-22 07:36:41,912 ERROR [ta_ontap_collection_worker://zeta:4535] [QuotaHandler] Problem collecting ontap:quota data from server=somefiler.somesite.mentorg.com : ('The read operation timed out',) 
Traceback (most recent call last): 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/handlers.py", line 129, in run results = qa.run() 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventory.py", line 106, in run return self.run7Mode() 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventoryQuota.py", line 20, in run7Mode self.aggregate_results(data, 'quota-report-iter-start', secondLevelDictArray) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapInventory.py", line 74, in aggregate_results for x in gen: 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapClient.py", line 521, in query7ModeGen response = naElementToDict(self.queryApi(api, OntapClient.projectResponse)) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/OntapClient.py", line 418, in queryApi response = self.connection.invoke_elem(api) 
File "/opt/splunk/etc/apps/Splunk_TA_ontap/bin/ta_ontap/NetApp/NaServer.py", line 483, in invoke_elem xml_response = response.read() 
File "/opt/splunk/lib/python2.7/httplib.py", line 593, in read s = self.fp.read() 
File "/opt/splunk/lib/python2.7/socket.py", line 355, in read data = self._sock.recv(rbufsize) 
File "/opt/splunk/lib/python2.7/ssl.py", line 734, in recv return self.read(buflen) 
File "/opt/splunk/lib/python2.7/ssl.py", line 621, in read v = self._sslobj.read(len or 1024) SSLError: ('The read operation timed out',)

The problem happens with filers at multiple sites, but is only occasional - mostly the data is collected as expected. WAN saturation does not appear to be an issue.

The lowest level of (apparent) SSL configurable timeout I could find is in Splunk_TA_ontap/bin/ta_ontap/NetApp/OntapClient.py, it has a CONNECTION_TIMEOUT=30 setting. I have changed that to 60 and will see if it alleviates the issue.

We have filers at 30+ sites, but the DCNs are all in our primary datacenter. Would it be better to have local DCNs and not collect data over the WAN? If I create a bunch of site-local DCNs, will the scheduler automatically know what DCN is "closest" to a filer (or can that be manually configured)?

Thanks!

Lee

0 Karma

lee_melvin
Path Finder

Changing the timeout from 30s to 60s dropped my read timeout error rate from ~25/hr to about 3/hr. I'm not sure if I'm solving a problem (WAN data collection just needs more time) or masking some other issue that needs solving, but it reduces my gaps in data collection. I've set a standard of CONNECTION_TIMEOUT = 120 for all existing and future DCNs in our environment.

0 Karma

cmeerbeek
Path Finder

We have exactly the same error and have created a support case for this. I will try to do as you suggested and will see how this plays out.

0 Karma
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...