Discussion:
Storing small files in the filenodes
Valeriy Sokolov
2013-07-17 20:14:55 UTC
Permalink
Not sure if the quoted message was delivered to the list (maybe because I
was not registered on this list), so reposting it this way...

On Fri, Jul 12, 2013 at 5:40 PM, Valeriy Sokolov
Hi,
I am trying to store lots of small (~2Kb) files in the filenode-s of the
pytables. And I ran into a trouble with size overhead.
200 such files which consumes in total ~2Mb on the filesystem takes 14Mb
in the .h5 file produced by pytables. My experiments show that if I create
200 file nodes and store 1 byte in each, I have .h5 of 14Mb. Approximately
from the size like 200Kb per file node I have a linear increase of size.
I.e. 400Kb per node leads to 89Mb, and 800Kb per node leads to 164Mb.
But I would like to store ~2Kb there and current overhead (like 70Kb per
file node) is pretty huge.
Could you please help me with work-around for this issue?
Thank you in advance.
--
Best regards,
Valeriy Sokolov.
--
Best regards,
Valeriy Sokolov.
Anthony Scopatz
2013-07-17 23:39:15 UTC
Permalink
Hello Valeriy,

For better or worse, the is exactly the performance I would expect. The
thing that you are running up against is that every HDF5 data set has 64 Kb
of header space for meta information. There is no way of changing this
without invalidating the HDF5 spec. The fact that you are seeing an
average of 70 Kb per data set is consistent since data sets don't need to
be contiguously stored.

I would suggest that you use a VLArray [1] of length-1 string atoms.
You'll lose the filenode interface but you'll also loose the 3200%
overhead =).

Be Well
Anthony

1.
http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-vlarray-class


On Wed, Jul 17, 2013 at 3:14 PM, Valeriy Sokolov
Post by Valeriy Sokolov
Not sure if the quoted message was delivered to the list (maybe because I
was not registered on this list), so reposting it this way...
Hi,
I am trying to store lots of small (~2Kb) files in the filenode-s of the
pytables. And I ran into a trouble with size overhead.
200 such files which consumes in total ~2Mb on the filesystem takes 14Mb
in the .h5 file produced by pytables. My experiments show that if I create
200 file nodes and store 1 byte in each, I have .h5 of 14Mb. Approximately
from the size like 200Kb per file node I have a linear increase of size.
I.e. 400Kb per node leads to 89Mb, and 800Kb per node leads to 164Mb.
But I would like to store ~2Kb there and current overhead (like 70Kb per
file node) is pretty huge.
Could you please help me with work-around for this issue?
Thank you in advance.
--
Best regards,
Valeriy Sokolov.
--
Best regards,
Valeriy Sokolov.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Valeriy Sokolov
2013-07-18 09:06:02 UTC
Permalink
Thank you, Anthony,

I will try VLArray as you suggested =)
Post by Anthony Scopatz
Hello Valeriy,
For better or worse, the is exactly the performance I would expect. The
thing that you are running up against is that every HDF5 data set has 64 Kb
of header space for meta information. There is no way of changing this
without invalidating the HDF5 spec. The fact that you are seeing an
average of 70 Kb per data set is consistent since data sets don't need to
be contiguously stored.
I would suggest that you use a VLArray [1] of length-1 string atoms.
You'll lose the filenode interface but you'll also loose the 3200%
overhead =).
Be Well
Anthony
1.
http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-vlarray-class
Post by Valeriy Sokolov
Not sure if the quoted message was delivered to the list (maybe because I
was not registered on this list), so reposting it this way...
On Fri, Jul 12, 2013 at 5:40 PM, Valeriy Sokolov <
Hi,
I am trying to store lots of small (~2Kb) files in the filenode-s of the
pytables. And I ran into a trouble with size overhead.
200 such files which consumes in total ~2Mb on the filesystem takes 14Mb
in the .h5 file produced by pytables. My experiments show that if I create
200 file nodes and store 1 byte in each, I have .h5 of 14Mb. Approximately
from the size like 200Kb per file node I have a linear increase of size.
I.e. 400Kb per node leads to 89Mb, and 800Kb per node leads to 164Mb.
But I would like to store ~2Kb there and current overhead (like 70Kb per
file node) is pretty huge.
Could you please help me with work-around for this issue?
Thank you in advance.
--
Best regards,
Valeriy Sokolov.
--
Best regards,
Valeriy Sokolov.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
--
Best regards,
Valeriy Sokolov.
Loading...