Nyirő Gergő
2013-08-05 09:11:32 UTC
Hello,
We develop a measurement evaluation tool, and we'd like to use
pytables/hdf5 as a middle layer for signal accessing.
We have to deal with the silly structure of the recorder device
measurement format.
The signals can be accessed via two identifiers:
* device name: <source of the signal>-<channel of the
message>-<another tag>-<yet another tag>
* signal name
The first identifier says the source information of the signal, which
can be quite long.
Therefore I grouped the device name into two layers:
/<source of the signal>
/<channel of the message>...
/<signal name>
So if you have the same message from two channels, than you will get
/foo-device-name
/channel-1
/bar
/baz
/channel-2
/bar
/baz
Besides signal loading, we have to search for signal name as fast as
possible, and return with the shortest unique device name part and the
signal name.
Using the structure above, iterating over the group names is quite
slow. So I build up a table from device and signal name.
As far as I know, the pytables query does not support string searching
(e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
to a pure python loop which is slow again.
Therefore I build up a python dictionary from the table, which provide
fast iteration against the table, but the init time increased from 100
ms to 3-4 sec (we have more than 40 000 signals).
Do you have any advice how to search for group names in hdf5 with
pytables in an efficient way?
ps: I would be most happy with a glob interface.
thanks for your advices in advance,
gergo
We develop a measurement evaluation tool, and we'd like to use
pytables/hdf5 as a middle layer for signal accessing.
We have to deal with the silly structure of the recorder device
measurement format.
The signals can be accessed via two identifiers:
* device name: <source of the signal>-<channel of the
message>-<another tag>-<yet another tag>
* signal name
The first identifier says the source information of the signal, which
can be quite long.
Therefore I grouped the device name into two layers:
/<source of the signal>
/<channel of the message>...
/<signal name>
So if you have the same message from two channels, than you will get
/foo-device-name
/channel-1
/bar
/baz
/channel-2
/bar
/baz
Besides signal loading, we have to search for signal name as fast as
possible, and return with the shortest unique device name part and the
signal name.
Using the structure above, iterating over the group names is quite
slow. So I build up a table from device and signal name.
As far as I know, the pytables query does not support string searching
(e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
to a pure python loop which is slow again.
Therefore I build up a python dictionary from the table, which provide
fast iteration against the table, but the init time increased from 100
ms to 3-4 sec (we have more than 40 000 signals).
Do you have any advice how to search for group names in hdf5 with
pytables in an efficient way?
ps: I would be most happy with a glob interface.
thanks for your advices in advance,
gergo