[pytables-users] Performance of list of arrays vs one table?

'Hawk Anonymous' via pytables-users

2017-03-03 12:49:31 UTC

Hello,

as I did not get any answers up to now, I tried t investigate this myself
by using a very simple example: two numpy arrays filled with random values.
Task: get all random numbers which are bigger in the first array then in
the second one.
I wrote this code to test it:

import tables
import numpy
class tdesc(tables.IsDescription):
r0 = tables.Float64Col()
r1 = tables.Float64Col()

r0 = numpy.random.rand(10**8)
r1 = numpy.random.rand(10**8)
hdf = tables.open_file("test.hdf", "w")
hdf.create_carray(hdf.root, "r0", obj=r0)
hdf.create_carray(hdf.root, "r1", obj=r1)
hdf.create_table(hdf.root, "rr", tdesc)
row = hdf.root.rr.row
for i in range(len(r0)):
row["r0"] = r0[i]
row["r1"] = r1[i]
row.append()
#data prepared in hdf file

#now, get all r0s which are bigger than r1

#read arrays, let numpy do the search
n = hdf.root.r0[:][hdf.root.r0[:] > hdf.root.r1[:]]
%timeit n = hdf.root.r0[:][hdf.root.r0[:] > hdf.root.r1[:]]
# ->1 loop, best of 3: 2.53 s per loop

#read arrays from table, use numpy for search
m = hdf.root.rr.col("r0")[hdf.root.rr.col("r0")>hdf.root.rr.col("r1")]
%timeit m = hdf.root.rr.col("r0")[hdf.root.rr.col("r0")>hdf.root.rr.col("r1"
)]
# -> 1 loop, best of 3: 3.65 s per loop

#use in-kernel search
o = [ x['r0'] for x in hdf.root.rr.where("""(r0 > r1)""") ]
%timeit o = [ x['r0'] for x in hdf.root.rr.where("""(r0 > r1)""") ]
# -> 1 loop, best of 3: 6.38 s per loop

print(len(n))
#50002016
print(len(m))
#49973491
print(len(o))
#49973491

As you can see, the arrays + numpy wins by a rough factor of 2 in speed.
What I do not understand is why n, m, and o are not the same.
I thing n is correct because it is always very close to 10**8/2 which i
would expect as a result but why does the table screw with my results?

--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.