Discussion:
Question about Leaf.remove() method
Forafo San
2013-08-29 15:40:26 UTC
Permalink
Hello All,
I have some data in an HDF5 file that is created with PyTables.
Occasionally, I update the data by reading in one of the tables and adding
or deleting rows. Then, I create a new table containing the updated data,
give it a random name, and let it reside in the same group where the old
table resides. I flush the new table, then use the table.remove() (or
Leaf.remove()) method to delete the old table and table.rename() method to
rename the randomly-named new table to the same name as the old table.

Problem:
In a small sized table, the size of the hdf5 file doubles with the above
process even when no new rows or other modifications are made (let's assume
that the hdf5 file contains only this table). A ptdump indicates no
presence of the old table.

In a medium-sized table, the size of the hdf5 file rises substantially (20%
or 30%) even when no new rows or columns are added.

Do I understand the table.remove() right as completely deleting the table?
Does it leave some residue that I should be aware of?

All help is appreciated. Thanks,
Premal
Anthony Scopatz
2013-08-29 17:21:23 UTC
Permalink
Hello Premal,

This is just how HDF5 works. When you delete a Leaf the reference to that
node is removed and the space in the file becomes available for future use.
However, HDF5 will not reduce files, it will only grow them. Thus new
data could fill in the used space but it doesn't go away. It just sits
there empty.

If you really want to get rid of this extraneous space you should use the
ptrepack or h5repack command line utilities to create a clean copy of the
file.

Hope this helps.

Be Well
Anthony
Post by Forafo San
Hello All,
I have some data in an HDF5 file that is created with PyTables.
Occasionally, I update the data by reading in one of the tables and adding
or deleting rows. Then, I create a new table containing the updated data,
give it a random name, and let it reside in the same group where the old
table resides. I flush the new table, then use the table.remove() (or
Leaf.remove()) method to delete the old table and table.rename() method to
rename the randomly-named new table to the same name as the old table.
In a small sized table, the size of the hdf5 file doubles with the above
process even when no new rows or other modifications are made (let's assume
that the hdf5 file contains only this table). A ptdump indicates no
presence of the old table.
In a medium-sized table, the size of the hdf5 file rises substantially
(20% or 30%) even when no new rows or columns are added.
Do I understand the table.remove() right as completely deleting the table?
Does it leave some residue that I should be aware of?
All help is appreciated. Thanks,
Premal
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Loading...