Discussion:
Scipy sparse large matrix multiplication and PyTables
Elliott Ash
2016-10-04 01:29:27 UTC
Did you ever figure out the best way to do this? the column slices on
PyTables are pretty slow. How to get a transpose of a CArray object?
a = np.random.rand(300,200)
b = a.T
f = tb.openFile('dot.h5', 'w')
filters = tb.Filters(complevel=5, complib='blosc')
out = f.createCArray(f.root, 'out', tb.Atom.from_dtype(a.dtype),
shape=(l, n), filters=filters)
_MB = 2**20
OOC_BUFFER_SIZE = 1028*_MB * 2
buffersize = OOC_BUFFER_SIZE
bl = math.sqrt(buffersize / out.dtype.itemsize)
bl = 2**int(math.log(bl, 2))
out[:,i:min(i+bl, l)] = a.dot(b[:,i:min(i+bl, l)])
Hope have not forgotten something import while copying.
Best,
Philipp
Hi Phillip,
Do you have something that works and is just slow? If so could you send us
a minimal script that we could take a look at? I think this might be slow
no matter what but maybe there is a way to make it less slow.
Be Well
Anthony
Hi!
I am currently struggling with very large scale matrix multiplications. I
am currently having a scipy.sparse.csr_matrix with shape (350363, 2526183)
and have to multiply it with its transpose. The scipy sparse matrix easily
fits into memory due to sparsity. However, the resulting matrix of X * X.T
is very dense and does not fit in memory. So I thought of using PyTables
for this approach.
For updating the PyTables array I am using a chunked approach discussed
However, it is veryyyyy slow, as slicing sparse matrices is not the
fastest way. Also, my data is immense.
Does anyone know how to best approach this?
Best,
Philipp
