Discussion:
suitable for storing data like k-v style?
Xianli Xu
2013-08-07 18:33:31 UTC
Permalink
Hi all,

I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g.

root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)

Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..

Thanks,
Jason
Xianli Xu
2013-08-07 18:39:55 UTC
Permalink
oops sorry, seem auto-correction of my email client created some typo for me : P
here's the corrections,
Post by Xianli Xu
Hi all,
I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g.
root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)
Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in
running -> tuning
Post by Xianli Xu
higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..
Thanks,
Jason
Anthony Scopatz
2013-08-07 21:02:59 UTC
Permalink
Hi Jason,

A key-value store pattern is definitely supported. However, be forewarned
that groups are implemented using B-trees, not hash tables. However, with
data of your size most of the access time will be in the leaf nodes and not
getting the group. I'd say try it out and see.

Be Well
Anthony
Post by Xianli Xu
Hi all,
I'm developing data processing service and evaluating if Pytable. Since
hdf5 supports hierarchical data like a tree of folder, can I use such a
tree-like structure as a K-V store like possibly store million of tables or
arrays under one group and randomly access any one of them in O(1) time?
e.g.
root/
user_log/
uid1-> table / array, (of tens of thousand rows /
elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)
Just wondering how the hierarchical structure is implemented and such
usage pattern is supported? if no, is there any running or better way to
store such type of information? We adopt Pytables because the data is
stored in higher density, faster loaded and no ACID / concurrency overhead,
so traditional DB and no-sql db is not our option..
Thanks,
Jason
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Loading...