Discussion:
Clear chunks from CArray
Giovanni Luca Ciampaglia
2013-08-05 19:09:01 UTC
Permalink
Hello all,

is there a way to clear out a chunk from a CArray? I noticed that setting the
data to zero actually takes disk space, i.e.

***
from tables import open_file, BoolAtom

h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***

The resulting file takes 249K ...

Best,
--
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ ***@indiana.edu
Anthony Scopatz
2013-08-05 19:14:12 UTC
Permalink
Hello Giovanni,

I think you may need to del that slice and then possibly repack. Hope this
helps.

Be Well
Anthony


On Mon, Aug 5, 2013 at 2:09 PM, Giovanni Luca Ciampaglia <
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that setting the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University
✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Giovanni Luca Ciampaglia
2013-08-05 20:14:54 UTC
Permalink
Hi Anthony,

what do you mean precisely? I tried

del ca[:,:]

but CArray does not support __delitem__. Looking in the documentation I could
only find a method called remove_rows, but it's in Table, not CArray. Maybe I am
missing something?

Thank,

Giovanni
Hello Giovanni, I think you may need to del that slice and then possibly
repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that setting
the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University
? 910 E 10th St ? Bloomington ? IN 47408
?http://cnets.indiana.edu/
--
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ ***@indiana.edu
Anthony Scopatz
2013-08-05 22:45:44 UTC
Permalink
On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia <
Post by Giovanni Luca Ciampaglia
Hi Anthony,
what do you mean precisely? I tried
del ca[:,:]
but CArray does not support __delitem__. Looking in the documentation I could
only find a method called remove_rows, but it's in Table, not CArray. Maybe I am
missing something?
Huh, it should... This is definitely an oversight on our part. If you
could please open an issue for this -- or better yet -- write a pull
request that implements delitem, that'd be great!

So I think you are right that there is no current way to delete rows from a
CArray. Oops! (Of course, I may still be missing something as well).

It looks like EArray also has this problem too, otherwise I would just tell
you to use that.

Be Well
Anthony
Post by Giovanni Luca Ciampaglia
Thank,
Giovanni
On Mon 05 Aug 2013 03:43:42 PM EDT,
Hello Giovanni, I think you may need to del that slice and then possibly
repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that
setting
Post by Giovanni Luca Ciampaglia
the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(),
shape=(1000,1000),
Post by Giovanni Luca Ciampaglia
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University
? 910 E 10th St ? Bloomington ? IN 47408
?http://cnets.indiana.edu/
--
Giovanni Luca Ciampaglia
Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University
✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Antonio Valentino
2013-08-06 06:29:55 UTC
Permalink
Hi Anthony, hi Giovanni,
Post by Giovanni Luca Ciampaglia
Hi Anthony,
what do you mean precisely? I tried
del ca[:,:]
but CArray does not support __delitem__. Looking in the documentation I could
only find a method called remove_rows, but it's in Table, not CArray. Maybe I am
missing something?
Huh, it should... This is definitely an oversight on our part. If you could please open an issue for this -- or better yet -- write a pull request that implements delitem, that'd be great!
So I think you are right that there is no current way to delete rows from a CArray. Oops! (Of course, I may still be missing something as well).
It looks like EArray also has this problem too, otherwise I would just tell you to use that.
I'm not sure to understand the problem.

The "truncate" method of arrays can be used to remove rows from an extendable array.
It seems to me that it is not documented but we should add it to the UG.

CArrays cannot be resized.
Post by Giovanni Luca Ciampaglia
Be Well
Anthony
Thank,
Giovanni
Hello Giovanni, I think you may need to del that slice and then possibly
repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that setting
the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
HDF5 handles efficiently chunks that have never been written saving some disk space but I doubt that chunks can be "de-initializad".
If my understanding is correct, once one write some value in a chunk (even if it is the default value) the chunk is allocated at HDF5 level and written to disk.
At they point one can only change item values.
Also I doubt that a repack can help in this case (not tested).

The only solution IMO is compression.


cheers

--
Antonio Valentino
Anthony Scopatz
2013-08-06 08:10:47 UTC
Permalink
Hi Antonio,

Now that you mention it I think that you are right that there is no way to
remove a chunk from an existing data set. If you think about this it makes
a lot of sense since you would have to alter the B-tree in strange and
unfortunate way. So HDF5 doesn't even try. However, you can fake it by
copying over only the data you want to keep, deleting the old data set, and
repacking. Or you can fake it by copying over the data you want to keep to
a new file. Neither of these are ideal, but they would work.

For existing data sets, especially if you are using compression, setting
all of the contents of a chunck to to same value should work
extraordinarily well.

Be Well
Anthony


On Mon, Aug 5, 2013 at 11:29 PM, Antonio Valentino <
Post by Antonio Valentino
Hi Anthony, hi Giovanni,
Post by Anthony Scopatz
On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia <
Hi Anthony,
what do you mean precisely? I tried
del ca[:,:]
but CArray does not support __delitem__. Looking in the documentation I
could
Post by Anthony Scopatz
only find a method called remove_rows, but it's in Table, not CArray.
Maybe I am
Post by Anthony Scopatz
missing something?
Huh, it should... This is definitely an oversight on our part. If you
could please open an issue for this -- or better yet -- write a pull
request that implements delitem, that'd be great!
Post by Anthony Scopatz
So I think you are right that there is no current way to delete rows
from a CArray. Oops! (Of course, I may still be missing something as
well).
Post by Anthony Scopatz
It looks like EArray also has this problem too, otherwise I would just
tell you to use that.
I'm not sure to understand the problem.
The "truncate" method of arrays can be used to remove rows from an extendable array.
It seems to me that it is not documented but we should add it to the UG.
CArrays cannot be resized.
Post by Anthony Scopatz
Be Well
Anthony
Thank,
Giovanni
On Mon 05 Aug 2013 03:43:42 PM EDT,
Hello Giovanni, I think you may need to del that slice and then
possibly
Post by Anthony Scopatz
repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09
PM,
Post by Anthony Scopatz
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that
setting
Post by Anthony Scopatz
Post by Giovanni Luca Ciampaglia
the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(),
shape=(1000,1000),
Post by Anthony Scopatz
Post by Giovanni Luca Ciampaglia
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
HDF5 handles efficiently chunks that have never been written saving some
disk space but I doubt that chunks can be "de-initializad".
If my understanding is correct, once one write some value in a chunk (even
if it is the default value) the chunk is allocated at HDF5 level and
written to disk.
At they point one can only change item values.
Also I doubt that a repack can help in this case (not tested).
The only solution IMO is compression.
cheers
--
Antonio Valentino
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
https://lists.sourceforge.net/lists/listinfo/pytables-users
Giovanni Luca Ciampaglia
2013-08-06 15:07:00 UTC
Permalink
hi Anthony and Antonio,

thanks for the explanations. I was hoping I could do it programmatically, but if
it is an inherent limitation of HDF5 there is little I can do. Compression plus
a sensible chunk size should help though.

Best,

Giovanni
Post by Antonio Valentino
Hi Anthony, hi Giovanni,
Post by Giovanni Luca Ciampaglia
Hi Anthony,
what do you mean precisely? I tried
del ca[:,:]
but CArray does not support __delitem__. Looking in the documentation I could
only find a method called remove_rows, but it's in Table, not CArray. Maybe I am
missing something?
Huh, it should... This is definitely an oversight on our part. If you could please open an issue for this -- or better yet -- write a pull request that implements delitem, that'd be great!
So I think you are right that there is no current way to delete rows from a CArray. Oops! (Of course, I may still be missing something as well).
It looks like EArray also has this problem too, otherwise I would just tell you to use that.
I'm not sure to understand the problem.
The "truncate" method of arrays can be used to remove rows from an extendable array.
It seems to me that it is not documented but we should add it to the UG.
CArrays cannot be resized.
Post by Giovanni Luca Ciampaglia
Be Well
Anthony
Thank,
Giovanni
Hello Giovanni, I think you may need to del that slice and then possibly
repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
Post by Giovanni Luca Ciampaglia
Hello all,
is there a way to clear out a chunk from a CArray? I noticed that setting
the
data to zero actually takes disk space, i.e.
***
from tables import open_file, BoolAtom
h5f = open_file('test.h5', 'w')
ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
chunkshape=(1,1000))
ca[:,:] = False
h5f.close()
***
The resulting file takes 249K ...
Best,
--
Giovanni Luca Ciampaglia
HDF5 handles efficiently chunks that have never been written saving some disk space but I doubt that chunks can be "de-initializad".
If my understanding is correct, once one write some value in a chunk (even if it is the default value) the chunk is allocated at HDF5 level and written to disk.
At they point one can only change item values.
Also I doubt that a repack can help in this case (not tested).
The only solution IMO is compression.
cheers
--
Antonio Valentino
--
Giovanni Luca Ciampaglia

Postdoctoral fellow
Center for Complex Networks and Systems Research
Indiana University

✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
☞ http://cnets.indiana.edu/
✉ ***@indiana.edu
Loading...