[pytables-users] mark/undo scope?

Discussion:

Ken Walker

2018-04-03 15:07:53 UTC

I am trying to interpret the documentation about operations that work w/
Mark/Undo:
*Hierarchy manipulation and attribute handling made after a mark can be
undone.*
*Data handling operations can not be undone nor redone.*

Based on my tests, I concluded modifications to a table's data CANNOT be
Undone (data handling?).
However, I can use table.copy(), modify the copy, and the table.copy() can
be Undone (hierarchy manipulation).
Correct?

I ask because I need to modify some values after extracting and before
doing statistical calculations. It is easier to get the data I need with
the PyTables slice and query tools, instead of operating on unstructured
NumPy arrays. Please advise if there's a better way to do this.

*Test #1 (did not work):*
file = tables.open_file()
file.enable_undo()
file.mark()
--makes some changes to the table w/ table.modify_coordinates()--
file.undo()
file.disable_undo()

*Test #2 (works):*
file = tables.open_file()
file.enable_undo()
file.mark()
table.copy()
--makes some changes to the copied table w/ table.modify_coordinates()--
file.undo()
file.disable_undo()

I expected Undo to work on the file created in test 1 based on this
statement from the PyTables doc:
*class tables.File()*
The in-memory representation of a PyTables file.

That means an in-memory representation of the hierarchy and data
structures, but not the data in a PyTables file, right?
(this makes sense given the large data sets PyTables is designed to handle)

Thanks,
-Ken

--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Francesc Alted

2018-04-03 15:55:03 UTC

Permalink

Hi Ken,

Post by Ken Walker
I am trying to interpret the documentation about operations that work w/
*Hierarchy manipulation and attribute handling made after a mark can be
undone.*
*Data handling operations can not be undone nor redone.*
Based on my tests, I concluded modifications to a table's data CANNOT be
Undone (data handling?).
However, I can use table.copy(), modify the copy, and the table.copy() can
be Undone (hierarchy manipulation).
Correct?

âCorrect.â

Post by Ken Walker
I ask because I need to modify some values after extracting and before
doing statistical calculations. It is easier to get the data I need with
the PyTables slice and query tools, instead of operating on unstructured
NumPy arrays. Please advise if there's a better way to do this.
*Test #1 (did not work):*
file = tables.open_file()
file.enable_undo()
file.mark()
--makes some changes to the table w/ table.modify_coordinates()--
file.undo()
file.disable_undo()

âYes, this is *not* supposed to work.â

Post by Ken Walker
*Test #2 (works):*
file = tables.open_file()
file.enable_undo()
file.mark()
table.copy()
--makes some changes to the copied table w/ table.modify_coordinates()--
file.undo()
file.disable_undo()
I expected Undo to work on the file created in test 1 based on this
*class tables.File()*
The in-memory representation of a PyTables file.
That means an in-memory representation of the hierarchy and data
structures, but not the data in a PyTables file, right?
(this makes sense given the large data sets PyTables is designed to handle)

âWell, both the hierarchy and data exist in the file, but tables.File()
replicates an in-memory representation of the *hierarchy* structure. Hmm,
perhaps a better wording for this would be: "tables.File() : The hierarchy
handler for the PyTables file." But your test #1 modified the data and
this is why this operation cannot be undone.

âHTH,â

--
Francesc Alted
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ken Walker

2018-04-03 16:49:09 UTC

Permalink

Francesc, thanks. That confirms my thoughts.
I could use opinions from PyTable experts for the best order of operations
when I need to both 1) modify the data and 2) query/slice (then perform
statistical analysis of the modified/sliced data).
Option 1:
I found a nice approach w/ PyTables to get table coordinates then slice the
data I need. (I have to do a lot repeated queries for different subsets of
data, and PyTables is awesome at this!) To use this method, I need a create
a temp table to hold the modified data, and use mark/table_copy/undo to
remove it when done.
Option 2:
I could extract the original data as an unstructured NumPy array, then
modify as needed. This gets more complicated because I still need to
query/slice the data after modification, and the NumPy tools for this
aren't as robust (I'm spoiled!). Also, the modifications
require additional queries on data in 2 other tables, so the easier
approach is using PyTables methods.

However, I'm still learning, so open to whatever method is "best".
FYI, this is not my data, hierarchy, or structure, so I have to "learn to
live with it". :-)

Thanks,
-Ken

Post by Francesc Alted
âWell, both the hierarchy and data exist in the file, but tables.File()
replicates an in-memory representation of the *hierarchy* structure. Hmm,
perhaps a better wording for this would be: "tables.File() : The hierarchy
handler for the PyTables file." But your test #1 modified the data and
this is why this operation cannot be undone.
âHTH,â
--
Francesc Alted

Francesc Alted

2018-04-04 06:05:28 UTC

Permalink

Post by Ken Walker
Francesc, thanks. That confirms my thoughts.
I could use opinions from PyTable experts for the best order of operations
when I need to both 1) modify the data and 2) query/slice (then perform
statistical analysis of the modified/sliced data).
I found a nice approach w/ PyTables to get table coordinates then slice
the data I need. (I have to do a lot repeated queries for different subsets
of data, and PyTables is awesome at this!) To use this method, I need a
create a temp table to hold the modified data, and use mark/table_copy/undo
to remove it when done.
I could extract the original data as an unstructured NumPy array, then
modify as needed. This gets more complicated because I still need to
query/slice the data after modification, and the NumPy tools for this
aren't as robust (I'm spoiled!). Also, the modifications
require additional queries on data in 2 other tables, so the easier
approach is using PyTables methods.

âA third possibility is to create a new temporary PyTables file in-memoryâ,
do your modifications on it, query it and free it (close it). This would
take even less memory than using an structured NumPy array (because of
compression).

However, my suggested solution for your scenario is to use the CORE driver
for HDF5 when opening your PyTables file and setting the parameter
backing_store to False; this allows to modify the file *in-memory* and
query it without modifying the the file on disk at all.

For more info on these two possibilities, read the cookbook at
https://www.pytables.org/cookbook/inmemory_hdf5_files.html (most specially
https://www.pytables.org/cookbook/inmemory_hdf5_files.html#backing-store).

âFrancescâ

Post by Ken Walker
However, I'm still learning, so open to whatever method is "best".
FYI, this is not my data, hierarchy, or structure, so I have to "learn to
live with it". :-)
Thanks,
-Ken

--
You received this message because you are subscribed to the Google Groups
"pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Ken Walker

2018-04-04 13:27:49 UTC

Permalink

Francesc,
Thanks for the pointer to the cookbook. I haven't read that section before.
The suggestion to use driver=HDF5_CORE and driver_core_backing_store=0
sounds like a great solution.
I will investigate. An in-memory copy of the file should be fine, as my
files aren't big by HDF5 standards.
Your third possibility sounds like an option when I have big files and low
RAM resources.
Thanks! That's why I asked. :)
-Ken

Post by Francesc Alted

Post by Ken Walker
Francesc, thanks. That confirms my thoughts.
I could use opinions from PyTable experts for the best order of
operations when I need to both 1) modify the data and 2) query/slice (then
perform statistical analysis of the modified/sliced data).
I found a nice approach w/ PyTables to get table coordinates then slice
the data I need. (I have to do a lot repeated queries for different subsets
of data, and PyTables is awesome at this!) To use this method, I need a
create a temp table to hold the modified data, and use mark/table_copy/undo
to remove it when done.
I could extract the original data as an unstructured NumPy array, then
modify as needed. This gets more complicated because I still need to
query/slice the data after modification, and the NumPy tools for this
aren't as robust (I'm spoiled!). Also, the modifications
require additional queries on data in 2 other tables, so the easier
approach is using PyTables methods.

âA third possibility is to create a new temporary PyTables file
in-memoryâ, do your modifications on it, query it and free it (close it).
This would take even less memory than using an structured NumPy array
(because of compression).
However, my suggested solution for your scenario is to use the CORE driver
for HDF5 when opening your PyTables file and setting the parameter
backing_store to False; this allows to modify the file *in-memory* and
query it without modifying the the file on disk at all.
For more info on these two possibilities, read the cookbook at
https://www.pytables.org/cookbook/inmemory_hdf5_files.html (most specially
https://www.pytables.org/cookbook/inmemory_hdf5_files.html#backing-store).
âFrancescâ

Post by Ken Walker
However, I'm still learning, so open to whatever method is "best".
FYI, this is not my data, hierarchy, or structure, so I have to "learn
to live with it". :-)
Thanks,
-Ken

--
You received this message because you are subscribed to the Google Groups
"pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
For more options, visit https://groups.google.com/d/optout.

--
Francesc Alted