Tim Althoff
2015-03-04 00:27:13 UTC
Hello,
I have two machines (very similar setup, ~96cores, 1TB RAM) that show very
different read performance (49s vs 1586s)
The only thing I'm doing is reading a h5 container in pandas:
pd.read_hdf("foo.h5", "data")
I am using IPython to profile it. Both machines run:
%prun pd.read_hdf("foo.h5", "data")
What could be causing this? None of the machine was under heavy load.
Thanks,
Tim
______________
Log
on machine 1
6023 function calls (5861 primitive calls) in 49.204 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 16.088 1.788 16.286 1.810 numeric.py:2328(array_equal)
1 11.735 11.735 11.735 11.735 {method '_read_records' of
'tables.tableextension.Table' objects}
7 6.463 0.923 6.463 0.923 {method 'astype' of
'numpy.ndarray' objects}
4 4.858 1.215 4.858 1.215 {method 'copy' of
'numpy.ndarray' objects}
13 2.941 0.226 2.941 0.226
{numpy.core.multiarray.concatenate}
1 2.395 2.395 2.395 2.395
{pandas.algos.take_2d_axis0_float64_float64}
122 2.061 0.017 2.061 0.017 {numpy.core.multiarray.array}
24 1.273 0.053 1.273 0.053 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
2 0.960 0.480 0.960 0.480
{pandas.algos.take_2d_axis0_int64_int64}
97 0.200 0.002 0.200 0.002 {method 'reduce' of
'numpy.ufunc' objects}
1 0.113 0.113 0.113 0.113
{pandas.algos.take_2d_axis0_bool_bool}
1 0.023 0.023 47.851 47.851 pytables.py:3747(read)
1 0.022 0.022 49.204 49.204 <string>:1(<module>)
4 0.019 0.005 0.019 0.005 {posix.access}
1 0.018 0.018 0.037 0.037 {method '_g_new' of
'tables.hdf5extension.File' objects}
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1 0.002 0.002 0.002 0.002 {method '_g_read_slice' of
'tables.hdf5extension.Array' objects}
1 0.001 0.001 0.001 0.001
{tables.utilsextension.read_f_attr}
1 0.001 0.001 11.736 11.736 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
82 0.001 0.000 0.005 0.000
attributeset.py:281(__getattr__)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
4 0.001 0.000 0.002 0.000 {method '_open_array' of
'tables.hdf5extension.Array' objects}
25 0.001 0.000 0.003 0.000 index.py:98(__new__)
734 0.001 0.000 0.001 0.000 {isinstance}
67 0.001 0.000 0.001 0.000 __init__.py:197(u_safe)
4 0.001 0.000 2.944 0.736 merge.py:1096(_concat_blocks)
1 0.001 0.001 3.472 3.472
internals.py:3135(_reindex_indexer_items)
14 0.001 0.000 0.001 0.000 atom.py:114(_normalize_shape)
73 0.001 0.000 0.002 0.000 fromnumeric.py:2259(prod)
25 0.000 0.000 0.001 0.000 {pandas.lib.infer_dtype}
138 0.000 0.000 0.002 0.000
attributeset.py:61(issysattrname)
9 0.000 0.000 0.006 0.001 attributeset.py:200(__init__)
98 0.000 0.000 2.061 0.021 numeric.py:394(asarray)
1 0.000 0.000 0.000 0.000 {pandas.lib.clean_index_list}
16 0.000 0.000 0.001 0.000 internals.py:1864(make_block)
1 0.000 0.000 0.001 0.001 {method '_get_info' of
'tables.tableextension.Table' objects}
12 0.000 0.000 0.001 0.000 index.py:409(is_unique)
98/12 0.000 0.000 0.331 0.028 file.py:1290(_get_node)
186/145 0.000 0.000 1.288 0.009 {getattr}
132 0.000 0.000 0.000 0.000 group.py:815(__setattr__)
20 0.000 0.000 0.001 0.000
common.py:1894(_asarray_tuplesafe)
8/2 0.000 0.000 0.330 0.165 node.py:206(__init__)
4 0.000 0.000 8.523 2.131 pytables.py:1865(convert)
14 0.000 0.000 0.001 0.000 atom.py:486(__init__)
on machine 2
28237 function calls (27926 primitive calls) in 1586.526 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 517.605 57.512 517.605 57.512 {method 'copy' of
'numpy.ndarray' objects}
1 288.453 288.453 288.453 288.453 {method '_read_records' of
'tables.tableextension.Table' objects}
5 220.011 44.002 220.011 44.002 {method 'astype' of
'numpy.ndarray' objects}
8 214.218 26.777 214.218 26.777 {method 'ravel' of
'numpy.ndarray' objects}
1 178.232 178.232 178.232 178.232
{pandas.algos.take_2d_axis0_float64_float64}
2 72.741 36.371 72.741 36.371
{pandas.algos.take_2d_axis0_int64_int64}
109/108 44.865 0.412 44.865 0.415 {numpy.core.multiarray.array}
7 35.176 5.025 35.240 5.034 numeric.py:2328(array_equal)
1 14.484 14.484 14.484 14.484
{pandas.algos.take_2d_axis0_bool_bool}
1 0.267 0.267 1586.385 1586.385 pytables.py:3966(read)
4 0.220 0.055 214.465 53.616
internals.py:4051(get_empty_dtype_and_na)
100 0.065 0.001 0.065 0.001 {method 'reduce' of
'numpy.ufunc' objects}
42 0.029 0.001 0.029 0.001 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
4 0.025 0.006 0.025 0.006
common.py:282(_isnull_ndarraylike)
243 0.017 0.000 0.037 0.000 inspect.py:247(getmembers)
7884/7843 0.009 0.000 0.052 0.000 {getattr}
1 0.008 0.008 0.082 0.082 __init__.py:20(<module>)
8150 0.007 0.000 0.007 0.000 {method 'append' of 'list'
objects}
243 0.005 0.000 0.005 0.000 {dir}
1 0.004 0.004 0.004 0.004 {method '_g_new' of
'tables.hdf5extension.File' objects}
1 0.004 0.004 0.054 0.054 file.py:20(<module>)
1 0.003 0.003 0.091 0.091 pytables.py:380(__init__)
315 0.003 0.000 0.043 0.000 _past.py:19(previous_api)
74 0.003 0.000 0.003 0.000 __init__.py:199(u_safe)
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1522/1520 0.002 0.000 0.003 0.000 {isinstance}
1 0.002 0.002 0.020 0.020 table.py:13(<module>)
1 0.002 0.002 0.017 0.017 description.py:13(<module>)
1 0.002 0.002 0.010 0.010 atom.py:13(<module>)
92 0.001 0.000 0.007 0.000
attributeset.py:281(__getattr__)
1 0.001 0.001 288.455 288.455 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
1 0.001 0.001 0.005 0.005 utils.py:13(<module>)
998/883 0.001 0.000 0.001 0.000 {len}
1 0.001 0.001 0.005 0.005 index.py:13(<module>)
73 0.001 0.000 0.001 0.000 {built-in method __new__ of
type object at 0x7fa28cc79880}
4 0.001 0.000 214.244 53.561 internals.py:4352(is_null)
15 0.001 0.000 0.001 0.000
sre_compile.py:207(_optimize_charset)
26 0.001 0.000 0.002 0.000
description.py:175(_subclass_from_prefix)
272 0.001 0.000 0.001 0.000 {method 'sort' of 'list'
objects}
1 0.001 0.001 0.008 0.008 group.py:13(<module>)
138 0.001 0.000 0.003 0.000
attributeset.py:61(issysattrname)
14/9 0.001 0.000 0.003 0.000 sre_parse.py:379(_parse)
320 0.001 0.000 0.001 0.000 inspect.py:67(ismethod)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
290 0.001 0.000 0.001 0.000 {method 'format' of 'str'
objects}
8 0.001 0.000 0.002 0.000
internals.py:2256(_rebuild_blknos_and_blklocs)
1 0.001 0.001 0.001 0.001 exceptions.py:13(<module>)
I have two machines (very similar setup, ~96cores, 1TB RAM) that show very
different read performance (49s vs 1586s)
The only thing I'm doing is reading a h5 container in pandas:
pd.read_hdf("foo.h5", "data")
I am using IPython to profile it. Both machines run:
%prun pd.read_hdf("foo.h5", "data")
What could be causing this? None of the machine was under heavy load.
Thanks,
Tim
______________
Log
on machine 1
6023 function calls (5861 primitive calls) in 49.204 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 16.088 1.788 16.286 1.810 numeric.py:2328(array_equal)
1 11.735 11.735 11.735 11.735 {method '_read_records' of
'tables.tableextension.Table' objects}
7 6.463 0.923 6.463 0.923 {method 'astype' of
'numpy.ndarray' objects}
4 4.858 1.215 4.858 1.215 {method 'copy' of
'numpy.ndarray' objects}
13 2.941 0.226 2.941 0.226
{numpy.core.multiarray.concatenate}
1 2.395 2.395 2.395 2.395
{pandas.algos.take_2d_axis0_float64_float64}
122 2.061 0.017 2.061 0.017 {numpy.core.multiarray.array}
24 1.273 0.053 1.273 0.053 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
2 0.960 0.480 0.960 0.480
{pandas.algos.take_2d_axis0_int64_int64}
97 0.200 0.002 0.200 0.002 {method 'reduce' of
'numpy.ufunc' objects}
1 0.113 0.113 0.113 0.113
{pandas.algos.take_2d_axis0_bool_bool}
1 0.023 0.023 47.851 47.851 pytables.py:3747(read)
1 0.022 0.022 49.204 49.204 <string>:1(<module>)
4 0.019 0.005 0.019 0.005 {posix.access}
1 0.018 0.018 0.037 0.037 {method '_g_new' of
'tables.hdf5extension.File' objects}
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1 0.002 0.002 0.002 0.002 {method '_g_read_slice' of
'tables.hdf5extension.Array' objects}
1 0.001 0.001 0.001 0.001
{tables.utilsextension.read_f_attr}
1 0.001 0.001 11.736 11.736 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
82 0.001 0.000 0.005 0.000
attributeset.py:281(__getattr__)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
4 0.001 0.000 0.002 0.000 {method '_open_array' of
'tables.hdf5extension.Array' objects}
25 0.001 0.000 0.003 0.000 index.py:98(__new__)
734 0.001 0.000 0.001 0.000 {isinstance}
67 0.001 0.000 0.001 0.000 __init__.py:197(u_safe)
4 0.001 0.000 2.944 0.736 merge.py:1096(_concat_blocks)
1 0.001 0.001 3.472 3.472
internals.py:3135(_reindex_indexer_items)
14 0.001 0.000 0.001 0.000 atom.py:114(_normalize_shape)
73 0.001 0.000 0.002 0.000 fromnumeric.py:2259(prod)
25 0.000 0.000 0.001 0.000 {pandas.lib.infer_dtype}
138 0.000 0.000 0.002 0.000
attributeset.py:61(issysattrname)
9 0.000 0.000 0.006 0.001 attributeset.py:200(__init__)
98 0.000 0.000 2.061 0.021 numeric.py:394(asarray)
1 0.000 0.000 0.000 0.000 {pandas.lib.clean_index_list}
16 0.000 0.000 0.001 0.000 internals.py:1864(make_block)
1 0.000 0.000 0.001 0.001 {method '_get_info' of
'tables.tableextension.Table' objects}
12 0.000 0.000 0.001 0.000 index.py:409(is_unique)
98/12 0.000 0.000 0.331 0.028 file.py:1290(_get_node)
186/145 0.000 0.000 1.288 0.009 {getattr}
132 0.000 0.000 0.000 0.000 group.py:815(__setattr__)
20 0.000 0.000 0.001 0.000
common.py:1894(_asarray_tuplesafe)
8/2 0.000 0.000 0.330 0.165 node.py:206(__init__)
4 0.000 0.000 8.523 2.131 pytables.py:1865(convert)
14 0.000 0.000 0.001 0.000 atom.py:486(__init__)
on machine 2
28237 function calls (27926 primitive calls) in 1586.526 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
9 517.605 57.512 517.605 57.512 {method 'copy' of
'numpy.ndarray' objects}
1 288.453 288.453 288.453 288.453 {method '_read_records' of
'tables.tableextension.Table' objects}
5 220.011 44.002 220.011 44.002 {method 'astype' of
'numpy.ndarray' objects}
8 214.218 26.777 214.218 26.777 {method 'ravel' of
'numpy.ndarray' objects}
1 178.232 178.232 178.232 178.232
{pandas.algos.take_2d_axis0_float64_float64}
2 72.741 36.371 72.741 36.371
{pandas.algos.take_2d_axis0_int64_int64}
109/108 44.865 0.412 44.865 0.415 {numpy.core.multiarray.array}
7 35.176 5.025 35.240 5.034 numeric.py:2328(array_equal)
1 14.484 14.484 14.484 14.484
{pandas.algos.take_2d_axis0_bool_bool}
1 0.267 0.267 1586.385 1586.385 pytables.py:3966(read)
4 0.220 0.055 214.465 53.616
internals.py:4051(get_empty_dtype_and_na)
100 0.065 0.001 0.065 0.001 {method 'reduce' of
'numpy.ufunc' objects}
42 0.029 0.001 0.029 0.001 {method '_g_get_objinfo' of
'tables.hdf5extension.Group' objects}
4 0.025 0.006 0.025 0.006
common.py:282(_isnull_ndarraylike)
243 0.017 0.000 0.037 0.000 inspect.py:247(getmembers)
7884/7843 0.009 0.000 0.052 0.000 {getattr}
1 0.008 0.008 0.082 0.082 __init__.py:20(<module>)
8150 0.007 0.000 0.007 0.000 {method 'append' of 'list'
objects}
243 0.005 0.000 0.005 0.000 {dir}
1 0.004 0.004 0.004 0.004 {method '_g_new' of
'tables.hdf5extension.File' objects}
1 0.004 0.004 0.054 0.054 file.py:20(<module>)
1 0.003 0.003 0.091 0.091 pytables.py:380(__init__)
315 0.003 0.000 0.043 0.000 _past.py:19(previous_api)
74 0.003 0.000 0.003 0.000 __init__.py:199(u_safe)
77 0.002 0.000 0.002 0.000 {method '_g_getattr' of
'tables.hdf5extension.AttributeSet' objects}
1522/1520 0.002 0.000 0.003 0.000 {isinstance}
1 0.002 0.002 0.020 0.020 table.py:13(<module>)
1 0.002 0.002 0.017 0.017 description.py:13(<module>)
1 0.002 0.002 0.010 0.010 atom.py:13(<module>)
92 0.001 0.000 0.007 0.000
attributeset.py:281(__getattr__)
1 0.001 0.001 288.455 288.455 table.py:1859(read)
1 0.001 0.001 0.001 0.001 {method '_close_file' of
'tables.hdf5extension.File' objects}
1 0.001 0.001 0.005 0.005 utils.py:13(<module>)
998/883 0.001 0.000 0.001 0.000 {len}
1 0.001 0.001 0.005 0.005 index.py:13(<module>)
73 0.001 0.000 0.001 0.000 {built-in method __new__ of
type object at 0x7fa28cc79880}
4 0.001 0.000 214.244 53.561 internals.py:4352(is_null)
15 0.001 0.000 0.001 0.000
sre_compile.py:207(_optimize_charset)
26 0.001 0.000 0.002 0.000
description.py:175(_subclass_from_prefix)
272 0.001 0.000 0.001 0.000 {method 'sort' of 'list'
objects}
1 0.001 0.001 0.008 0.008 group.py:13(<module>)
138 0.001 0.000 0.003 0.000
attributeset.py:61(issysattrname)
14/9 0.001 0.000 0.003 0.000 sre_parse.py:379(_parse)
320 0.001 0.000 0.001 0.000 inspect.py:67(ismethod)
5 0.001 0.000 0.001 0.000 {method '_g_get_lchild_attr'
of 'tables.hdf5extension.Group' objects}
290 0.001 0.000 0.001 0.000 {method 'format' of 'str'
objects}
8 0.001 0.000 0.002 0.000
internals.py:2256(_rebuild_blknos_and_blklocs)
1 0.001 0.001 0.001 0.001 exceptions.py:13(<module>)
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.