Xianli Xu
2013-08-07 18:33:31 UTC
Hi all,
I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g.
root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)
Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..
Thanks,
Jason
I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g.
root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)
Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..
Thanks,
Jason