Not buffering is definitely the problem here.
I created the following class:
class ResponseReaderWrapper(io.RawIOBase):
def __init__(self, responseReader):
self.responseReader = responseReader
def readable(self):
return True
def close(self):
self.responseReader.close()
def read(self, n):
return self.responseReader.read(n)
def readinto(self, b):
sz = len(b)
data = self.responseReader.read(sz)
for idx, ch in enumerate(data):
b[idx] = ch
return len(data)
And then this allows me to utilize the io.BufferedReader as follows:
rs = job.results(count=maxRecords, offset=self._offset)
results.ResultsReader(io.BufferedReader(ResponseReaderWrapper(rs)))
This means my query and pulling the results now runs in ~3 seconds rather than 90+ seconds as before.
It would be nice if ResponseReader implemented the readable and readinto methods so it were more streamlike, then this ResponseReaderWrapper class wouldn't be necessary - happy to provide a pull-request for this if you agree.
... View more