suitable for storing data like k-v style?

Discussion:

Xianli Xu

2013-08-07 18:33:31 UTC

Hi all,

I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g.

root/
user_log/
uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
…… (perhaps million user)

Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..

Thanks,
Jason

Xianli Xu

2013-08-07 18:39:55 UTC

Permalink

oops sorry, seem auto-correction of my email client created some typo for me : P
here's the corrections,

running -> tuning

Post by Xianli Xu
higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option..
Thanks,
Jason

Anthony Scopatz

2013-08-07 21:02:59 UTC

Permalink

Hi Jason,

A key-value store pattern is definitely supported. However, be forewarned
that groups are implemented using B-trees, not hash tables. However, with
data of your size most of the access time will be in the leaf nodes and not
getting the group. I'd say try it out and see.

Be Well
Anthony

Post by Xianli Xu
Hi all,
I'm developing data processing service and evaluating if Pytable. Since
hdf5 supports hierarchical data like a tree of folder, can I use such a
tree-like structure as a K-V store like possibly store million of tables or
arrays under one group and randomly access any one of them in O(1) time?
e.g.
root/
user_log/
uid1-> table / array, (of tens of thousand rows /
elements, ETL'ed user log info in int format)
uid2-> table / array,
uid3-> table / array,
uid4-> table / array,
uid5-> table / array,
(perhaps million user)
Just wondering how the hierarchical structure is implemented and such
usage pattern is supported? if no, is there any running or better way to
store such type of information? We adopt Pytables because the data is
stored in higher density, faster loaded and no ACID / concurrency overhead,
so traditional DB and no-sql db is not our option..
Thanks,
Jason
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users