juan jose gomez cadenas
2016-07-17 20:23:29 UTC
Dear users,
I have to store large arrays of raw data corresponding to photomultiplier
waveforms not yet zero suppressed. Specifically I have 12 PMTs waveforms to
store, each one containing about 1.2 million integers. My detector records
millions of such events. I am looking for the most efficiency way to store
the data.
One possibility is to use a nested array. For example:
class PMTRD(tables.IsDescription):
# event_id = tables.Int32Col(pos=1, indexed=True)
event_id = tables.Int32Col(pos=1)
# values means no parent)
pmtrd = tables.Int32Col(shape=LEN_PMT, pos=2)
then fill like this:
for i in range(NEVENTS):
for j in range(NPMTS):
pmt['event_id'] = i
pmt['pmtrd'] =raw_data(LEN_PMT)
# This injects the row values.
pmt.append()
table.flush()
here NPMTS = 12, LEN_PMT = 1.2e+6 and NEVENTS is a large number. This
system works and gives a table with two columns, one for event-id and the
other for the PMT raw data. The column for the PMT raw data is nested (it
contains the large raw data vector). Not sure this is very efficient. The
raws are NPMS*NEVENTS.
Alternatively one could store the raw data using carrays
hcnt = h5file.create_carray(root, name, atom, raw_data.shape,
filters=filters)
hcnt[:] = raw_data_per_pmt
one then uses the tree structure of pyTables to store the data, something
like
/root/rawData/event1/PMT1
carray
/root/rawData/event1/PMT2
carray
...
/root/rawData/event1/PMT12
carray
/root/rawData/event2/PMT1
carray
and so on...
My question: what is the best (most efficient) way to store such raw data?
Any limitations on the number of events that can be stored per file using
method "a" of method "b"? Any tips?
Thanks a lot,
I have to store large arrays of raw data corresponding to photomultiplier
waveforms not yet zero suppressed. Specifically I have 12 PMTs waveforms to
store, each one containing about 1.2 million integers. My detector records
millions of such events. I am looking for the most efficiency way to store
the data.
One possibility is to use a nested array. For example:
class PMTRD(tables.IsDescription):
# event_id = tables.Int32Col(pos=1, indexed=True)
event_id = tables.Int32Col(pos=1)
# values means no parent)
pmtrd = tables.Int32Col(shape=LEN_PMT, pos=2)
then fill like this:
for i in range(NEVENTS):
for j in range(NPMTS):
pmt['event_id'] = i
pmt['pmtrd'] =raw_data(LEN_PMT)
# This injects the row values.
pmt.append()
table.flush()
here NPMTS = 12, LEN_PMT = 1.2e+6 and NEVENTS is a large number. This
system works and gives a table with two columns, one for event-id and the
other for the PMT raw data. The column for the PMT raw data is nested (it
contains the large raw data vector). Not sure this is very efficient. The
raws are NPMS*NEVENTS.
Alternatively one could store the raw data using carrays
hcnt = h5file.create_carray(root, name, atom, raw_data.shape,
filters=filters)
hcnt[:] = raw_data_per_pmt
one then uses the tree structure of pyTables to store the data, something
like
/root/rawData/event1/PMT1
carray
/root/rawData/event1/PMT2
carray
...
/root/rawData/event1/PMT12
carray
/root/rawData/event2/PMT1
carray
and so on...
My question: what is the best (most efficient) way to store such raw data?
Any limitations on the number of events that can be stored per file using
method "a" of method "b"? Any tips?
Thanks a lot,
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.