Discussion:
[pytables-users] Huge difference in read performance on two different machines.
Tim Althoff
2015-03-04 00:27:13 UTC
Permalink
Hello,

I have two machines (very similar setup, ~96cores, 1TB RAM) that show very
different read performance (49s vs 1586s)
The only thing I'm doing is reading a h5 container in pandas:
pd.read_hdf("foo.h5", "data")

I am using IPython to profile it. Both machines run:
%prun pd.read_hdf("foo.h5", "data")

What could be causing this? None of the machine was under heavy load.

Thanks,
Tim


______________
Log

on machine 1

6023 function calls (5861 primitive calls) in 49.204 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
9 16.088 1.788 16.286 1.810 numeric.py:2328(array_equal)
1 11.735 11.735 11.735 11.735 {method '_read_records' of
'tables.tableextension.Table' objects}
7 6.463 0.923 6.463 0.923 {method 'astype' of
'numpy.ndarray' objects}
4 4.858 1.215 4.858 1.215 {method 'copy' of
'numpy.ndarray' objects}
13 2.941 0.226 2.941 0.226
{numpy.core.multiarray.concatenate}
1 2.395 2.395 2.395 2.395
{pandas.algos.take_2d_axis0_float64_float64}
122 2.061 0.017 2.061 0.017 {numpy.core.multiarray.array}
24 1.273 0.053 1.273 0.053 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
2 0.960 0.480 0.960 0.480
{pandas.algos.take_2d_axis0_int64_int64}
97 0.200 0.002 0.200 0.002 {method 'reduce' of
'numpy.ufunc' objects}
1 0.113 0.113 0.113 0.113
{pandas.algos.take_2d_axis0_bool_bool}
1 0.023 0.023 47.851 47.851 pytables.py:3747(read)
1 0.022 0.022 49.204 49.204 <string>:1(<module>)
4 0.019 0.005 0.019 0.005 {posix.access}
1 0.018 0.018 0.037 0.037 {method '_g_new' of
'tables.hdf5extension.File' objects}
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1 0.002 0.002 0.002 0.002 {method '_g_read_slice' of
'tables.hdf5extension.Array' objects}
1 0.001 0.001 0.001 0.001
{tables.utilsextension.read_f_attr}
1 0.001 0.001 11.736 11.736 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
82 0.001 0.000 0.005 0.000
attributeset.py:281(__getattr__)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
4 0.001 0.000 0.002 0.000 {method '_open_array' of
'tables.hdf5extension.Array' objects}
25 0.001 0.000 0.003 0.000 index.py:98(__new__)
734 0.001 0.000 0.001 0.000 {isinstance}
67 0.001 0.000 0.001 0.000 __init__.py:197(u_safe)
4 0.001 0.000 2.944 0.736 merge.py:1096(_concat_blocks)
1 0.001 0.001 3.472 3.472
internals.py:3135(_reindex_indexer_items)
14 0.001 0.000 0.001 0.000 atom.py:114(_normalize_shape)
73 0.001 0.000 0.002 0.000 fromnumeric.py:2259(prod)
25 0.000 0.000 0.001 0.000 {pandas.lib.infer_dtype}
138 0.000 0.000 0.002 0.000
attributeset.py:61(issysattrname)
9 0.000 0.000 0.006 0.001 attributeset.py:200(__init__)
98 0.000 0.000 2.061 0.021 numeric.py:394(asarray)
1 0.000 0.000 0.000 0.000 {pandas.lib.clean_index_list}
16 0.000 0.000 0.001 0.000 internals.py:1864(make_block)
1 0.000 0.000 0.001 0.001 {method '_get_info' of
'tables.tableextension.Table' objects}
12 0.000 0.000 0.001 0.000 index.py:409(is_unique)
98/12 0.000 0.000 0.331 0.028 file.py:1290(_get_node)
186/145 0.000 0.000 1.288 0.009 {getattr}
132 0.000 0.000 0.000 0.000 group.py:815(__setattr__)
20 0.000 0.000 0.001 0.000
common.py:1894(_asarray_tuplesafe)
8/2 0.000 0.000 0.330 0.165 node.py:206(__init__)
4 0.000 0.000 8.523 2.131 pytables.py:1865(convert)
14 0.000 0.000 0.001 0.000 atom.py:486(__init__)





on machine 2

28237 function calls (27926 primitive calls) in 1586.526 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
9 517.605 57.512 517.605 57.512 {method 'copy' of
'numpy.ndarray' objects}
1 288.453 288.453 288.453 288.453 {method '_read_records' of
'tables.tableextension.Table' objects}
5 220.011 44.002 220.011 44.002 {method 'astype' of
'numpy.ndarray' objects}
8 214.218 26.777 214.218 26.777 {method 'ravel' of
'numpy.ndarray' objects}
1 178.232 178.232 178.232 178.232
{pandas.algos.take_2d_axis0_float64_float64}
2 72.741 36.371 72.741 36.371
{pandas.algos.take_2d_axis0_int64_int64}
109/108 44.865 0.412 44.865 0.415 {numpy.core.multiarray.array}
7 35.176 5.025 35.240 5.034 numeric.py:2328(array_equal)
1 14.484 14.484 14.484 14.484
{pandas.algos.take_2d_axis0_bool_bool}
1 0.267 0.267 1586.385 1586.385 pytables.py:3966(read)
4 0.220 0.055 214.465 53.616
internals.py:4051(get_empty_dtype_and_na)
100 0.065 0.001 0.065 0.001 {method 'reduce' of
'numpy.ufunc' objects}
42 0.029 0.001 0.029 0.001 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
4 0.025 0.006 0.025 0.006
common.py:282(_isnull_ndarraylike)
243 0.017 0.000 0.037 0.000 inspect.py:247(getmembers)
7884/7843 0.009 0.000 0.052 0.000 {getattr}
1 0.008 0.008 0.082 0.082 __init__.py:20(<module>)
8150 0.007 0.000 0.007 0.000 {method 'append' of 'list'
objects}
243 0.005 0.000 0.005 0.000 {dir}
1 0.004 0.004 0.004 0.004 {method '_g_new' of
'tables.hdf5extension.File' objects}
1 0.004 0.004 0.054 0.054 file.py:20(<module>)
1 0.003 0.003 0.091 0.091 pytables.py:380(__init__)
315 0.003 0.000 0.043 0.000 _past.py:19(previous_api)
74 0.003 0.000 0.003 0.000 __init__.py:199(u_safe)
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1522/1520 0.002 0.000 0.003 0.000 {isinstance}
1 0.002 0.002 0.020 0.020 table.py:13(<module>)
1 0.002 0.002 0.017 0.017 description.py:13(<module>)
1 0.002 0.002 0.010 0.010 atom.py:13(<module>)
92 0.001 0.000 0.007 0.000
attributeset.py:281(__getattr__)
1 0.001 0.001 288.455 288.455 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
1 0.001 0.001 0.005 0.005 utils.py:13(<module>)
998/883 0.001 0.000 0.001 0.000 {len}
1 0.001 0.001 0.005 0.005 index.py:13(<module>)
73 0.001 0.000 0.001 0.000 {built-in method __new__ of
type object at 0x7fa28cc79880}
4 0.001 0.000 214.244 53.561 internals.py:4352(is_null)
15 0.001 0.000 0.001 0.000
sre_compile.py:207(_optimize_charset)
26 0.001 0.000 0.002 0.000
description.py:175(_subclass_from_prefix)
272 0.001 0.000 0.001 0.000 {method 'sort' of 'list'
objects}
1 0.001 0.001 0.008 0.008 group.py:13(<module>)
138 0.001 0.000 0.003 0.000
attributeset.py:61(issysattrname)
14/9 0.001 0.000 0.003 0.000 sre_parse.py:379(_parse)
320 0.001 0.000 0.001 0.000 inspect.py:67(ismethod)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
290 0.001 0.000 0.001 0.000 {method 'format' of 'str'
objects}
8 0.001 0.000 0.002 0.000
internals.py:2256(_rebuild_blknos_and_blklocs)
1 0.001 0.001 0.001 0.001 exceptions.py:13(<module>)
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesc Alted
2015-03-04 08:10:21 UTC
Permalink
Hi,

Just guessing here, but is your HDF5 file compressed with blosc? Iff that
is the case, maybe this is an example of thread overhead that you can try
to alleviate by setting less threads available to blosc (say just 1 til 4):

http://pytables.github.io/usersguide/libref/top_level.html?highlight=thread#tables.set_blosc_max_threads

Francesc
Post by Tim Althoff
Hello,
I have two machines (very similar setup, ~96cores, 1TB RAM) that show very
different read performance (49s vs 1586s)
pd.read_hdf("foo.h5", "data")
%prun pd.read_hdf("foo.h5", "data")
What could be causing this? None of the machine was under heavy load.
Thanks,
Tim
______________
Log
on machine 1
6023 function calls (5861 primitive calls) in 49.204 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 16.088 1.788 16.286 1.810 numeric.py:2328(array_equal)
1 11.735 11.735 11.735 11.735 {method '_read_records' of
'tables.tableextension.Table' objects}
7 6.463 0.923 6.463 0.923 {method 'astype' of
'numpy.ndarray' objects}
4 4.858 1.215 4.858 1.215 {method 'copy' of
'numpy.ndarray' objects}
13 2.941 0.226 2.941 0.226
{numpy.core.multiarray.concatenate}
1 2.395 2.395 2.395 2.395
{pandas.algos.take_2d_axis0_float64_float64}
122 2.061 0.017 2.061 0.017 {numpy.core.multiarray.array}
24 1.273 0.053 1.273 0.053 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
2 0.960 0.480 0.960 0.480
{pandas.algos.take_2d_axis0_int64_int64}
97 0.200 0.002 0.200 0.002 {method 'reduce' of
'numpy.ufunc' objects}
1 0.113 0.113 0.113 0.113
{pandas.algos.take_2d_axis0_bool_bool}
1 0.023 0.023 47.851 47.851 pytables.py:3747(read)
1 0.022 0.022 49.204 49.204 <string>:1(<module>)
4 0.019 0.005 0.019 0.005 {posix.access}
1 0.018 0.018 0.037 0.037 {method '_g_new' of
'tables.hdf5extension.File' objects}
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1 0.002 0.002 0.002 0.002 {method '_g_read_slice' of
'tables.hdf5extension.Array' objects}
1 0.001 0.001 0.001 0.001
{tables.utilsextension.read_f_attr}
1 0.001 0.001 11.736 11.736 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
82 0.001 0.000 0.005 0.000
attributeset.py:281(__getattr__)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
4 0.001 0.000 0.002 0.000 {method '_open_array' of
'tables.hdf5extension.Array' objects}
25 0.001 0.000 0.003 0.000 index.py:98(__new__)
734 0.001 0.000 0.001 0.000 {isinstance}
67 0.001 0.000 0.001 0.000 __init__.py:197(u_safe)
4 0.001 0.000 2.944 0.736 merge.py:1096(_concat_blocks)
1 0.001 0.001 3.472 3.472
internals.py:3135(_reindex_indexer_items)
14 0.001 0.000 0.001 0.000 atom.py:114(_normalize_shape)
73 0.001 0.000 0.002 0.000 fromnumeric.py:2259(prod)
25 0.000 0.000 0.001 0.000 {pandas.lib.infer_dtype}
138 0.000 0.000 0.002 0.000
attributeset.py:61(issysattrname)
9 0.000 0.000 0.006 0.001 attributeset.py:200(__init__)
98 0.000 0.000 2.061 0.021 numeric.py:394(asarray)
1 0.000 0.000 0.000 0.000 {pandas.lib.clean_index_list}
16 0.000 0.000 0.001 0.000 internals.py:1864(make_block)
1 0.000 0.000 0.001 0.001 {method '_get_info' of
'tables.tableextension.Table' objects}
12 0.000 0.000 0.001 0.000 index.py:409(is_unique)
98/12 0.000 0.000 0.331 0.028 file.py:1290(_get_node)
186/145 0.000 0.000 1.288 0.009 {getattr}
132 0.000 0.000 0.000 0.000 group.py:815(__setattr__)
20 0.000 0.000 0.001 0.000
common.py:1894(_asarray_tuplesafe)
8/2 0.000 0.000 0.330 0.165 node.py:206(__init__)
4 0.000 0.000 8.523 2.131 pytables.py:1865(convert)
14 0.000 0.000 0.001 0.000 atom.py:486(__init__)
on machine 2
28237 function calls (27926 primitive calls) in 1586.526 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 517.605 57.512 517.605 57.512 {method 'copy' of
'numpy.ndarray' objects}
1 288.453 288.453 288.453 288.453 {method '_read_records' of
'tables.tableextension.Table' objects}
5 220.011 44.002 220.011 44.002 {method 'astype' of
'numpy.ndarray' objects}
8 214.218 26.777 214.218 26.777 {method 'ravel' of
'numpy.ndarray' objects}
1 178.232 178.232 178.232 178.232
{pandas.algos.take_2d_axis0_float64_float64}
2 72.741 36.371 72.741 36.371
{pandas.algos.take_2d_axis0_int64_int64}
109/108 44.865 0.412 44.865 0.415 {numpy.core.multiarray.array}
7 35.176 5.025 35.240 5.034 numeric.py:2328(array_equal)
1 14.484 14.484 14.484 14.484
{pandas.algos.take_2d_axis0_bool_bool}
1 0.267 0.267 1586.385 1586.385 pytables.py:3966(read)
4 0.220 0.055 214.465 53.616
internals.py:4051(get_empty_dtype_and_na)
100 0.065 0.001 0.065 0.001 {method 'reduce' of
'numpy.ufunc' objects}
42 0.029 0.001 0.029 0.001 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
4 0.025 0.006 0.025 0.006
common.py:282(_isnull_ndarraylike)
243 0.017 0.000 0.037 0.000 inspect.py:247(getmembers)
7884/7843 0.009 0.000 0.052 0.000 {getattr}
1 0.008 0.008 0.082 0.082 __init__.py:20(<module>)
8150 0.007 0.000 0.007 0.000 {method 'append' of 'list'
objects}
243 0.005 0.000 0.005 0.000 {dir}
1 0.004 0.004 0.004 0.004 {method '_g_new' of
'tables.hdf5extension.File' objects}
1 0.004 0.004 0.054 0.054 file.py:20(<module>)
1 0.003 0.003 0.091 0.091 pytables.py:380(__init__)
315 0.003 0.000 0.043 0.000 _past.py:19(previous_api)
74 0.003 0.000 0.003 0.000 __init__.py:199(u_safe)
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1522/1520 0.002 0.000 0.003 0.000 {isinstance}
1 0.002 0.002 0.020 0.020 table.py:13(<module>)
1 0.002 0.002 0.017 0.017 description.py:13(<module>)
1 0.002 0.002 0.010 0.010 atom.py:13(<module>)
92 0.001 0.000 0.007 0.000
attributeset.py:281(__getattr__)
1 0.001 0.001 288.455 288.455 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
1 0.001 0.001 0.005 0.005 utils.py:13(<module>)
998/883 0.001 0.000 0.001 0.000 {len}
1 0.001 0.001 0.005 0.005 index.py:13(<module>)
73 0.001 0.000 0.001 0.000 {built-in method __new__ of
type object at 0x7fa28cc79880}
4 0.001 0.000 214.244 53.561 internals.py:4352(is_null)
15 0.001 0.000 0.001 0.000
sre_compile.py:207(_optimize_charset)
26 0.001 0.000 0.002 0.000
description.py:175(_subclass_from_prefix)
272 0.001 0.000 0.001 0.000 {method 'sort' of 'list'
objects}
1 0.001 0.001 0.008 0.008 group.py:13(<module>)
138 0.001 0.000 0.003 0.000
attributeset.py:61(issysattrname)
14/9 0.001 0.000 0.003 0.000 sre_parse.py:379(_parse)
320 0.001 0.000 0.001 0.000 inspect.py:67(ismethod)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
290 0.001 0.000 0.001 0.000 {method 'format' of 'str'
objects}
8 0.001 0.000 0.002 0.000
internals.py:2256(_rebuild_blknos_and_blklocs)
1 0.001 0.001 0.001 0.001 exceptions.py:13(<module>)
--
You received this message because you are subscribed to the Google Groups
"pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Francesc Alted
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Continue reading on narkive:
Loading...