Evan
2016-10-12 04:11:22 UTC
The PyTables library and the HDFStore object (based on PyTables) both
provide indexing for the user.
For the latter case, users instantiate an HDFStore object and then chose
which columns to index.
store = HDFStore('file1.hd5')
key = "key_name"
index_columns = ["column1", "column2"]
store.append(key,... data_columns=index_columns)
Here we index on two columns, which should optimize our search.
For PyTables alone, we create an HDF5 file as follows (from the
documentation):
from tables import *
class Particle(IsDescription):
identity = StringCol(itemsize=22, dflt=" ", pos=0) # character String
idnumber = Int16Col(dflt=1, pos = 1) # short integer
speed = Float32Col(dflt=1, pos = 2) # single-precision
# Open a file in "w"rite modefileh = open_file("objecttree.h5", mode = "w")
# Get the HDF5 root grouproot = fileh.root
# Create the groupsgroup1 = fileh.create_group(root, "group1")group2 = fileh.create_group(root, "group2")
# Now, create an array in root grouparray1 = fileh.create_array(root, "array1", ["string", "array"], "String array")
# Create 1 new tables in group1table1 = fileh.create_table(group1, "table1", Particle)
# Get the record object associated with the table:row = table1.row
# Fill the table with 10 recordsfor i in xrange(10):
# First, assign the values to the Particle record
row['identity'] = 'This is particle: %2d' % (i)
row['idnumber'] = i
row['speed'] = i * 2.
# This injects the Record values
row.append()
# Flush the table bufferstable.flush()
# Finally, close the file (this also will flush all the remaining buffers!)fileh.close()
Users index columns by using "Column.create_index()"
For example:
indexrows = table.cols.var1.create_index()
indexrows = table.cols.var2.create_index()
indexrows = table.cols.var3.create_index()
Two questions:
(1) I'm afraid in our PyTables example above, it is not clear to me how to
set the indices (indexes). There are no columns defined. To my mind, there
are three fields: identity, idnumber, speed. Let's say I wanted to place an
index on speed and identity. How would one do this?
(2) Are there any benchmarks between the the pandas based indexing and
PyTables based indexing? Is one faster than the other? Does one take up
more disk space (i.e. larger HDF5 file) than the other?
Thank you for any help! Apologies for so many questions recently
Thanks, Evan
provide indexing for the user.
For the latter case, users instantiate an HDFStore object and then chose
which columns to index.
store = HDFStore('file1.hd5')
key = "key_name"
index_columns = ["column1", "column2"]
store.append(key,... data_columns=index_columns)
Here we index on two columns, which should optimize our search.
For PyTables alone, we create an HDF5 file as follows (from the
documentation):
from tables import *
class Particle(IsDescription):
identity = StringCol(itemsize=22, dflt=" ", pos=0) # character String
idnumber = Int16Col(dflt=1, pos = 1) # short integer
speed = Float32Col(dflt=1, pos = 2) # single-precision
# Open a file in "w"rite modefileh = open_file("objecttree.h5", mode = "w")
# Get the HDF5 root grouproot = fileh.root
# Create the groupsgroup1 = fileh.create_group(root, "group1")group2 = fileh.create_group(root, "group2")
# Now, create an array in root grouparray1 = fileh.create_array(root, "array1", ["string", "array"], "String array")
# Create 1 new tables in group1table1 = fileh.create_table(group1, "table1", Particle)
# Get the record object associated with the table:row = table1.row
# Fill the table with 10 recordsfor i in xrange(10):
# First, assign the values to the Particle record
row['identity'] = 'This is particle: %2d' % (i)
row['idnumber'] = i
row['speed'] = i * 2.
# This injects the Record values
row.append()
# Flush the table bufferstable.flush()
# Finally, close the file (this also will flush all the remaining buffers!)fileh.close()
Users index columns by using "Column.create_index()"
For example:
indexrows = table.cols.var1.create_index()
indexrows = table.cols.var2.create_index()
indexrows = table.cols.var3.create_index()
Two questions:
(1) I'm afraid in our PyTables example above, it is not clear to me how to
set the indices (indexes). There are no columns defined. To my mind, there
are three fields: identity, idnumber, speed. Let's say I wanted to place an
index on speed and identity. How would one do this?
(2) Are there any benchmarks between the the pandas based indexing and
PyTables based indexing? Is one faster than the other? Does one take up
more disk space (i.e. larger HDF5 file) than the other?
Thank you for any help! Apologies for so many questions recently
Thanks, Evan
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.