Discussion:
[pytables-users] Slicing and dicing a large three dimensional dataset
Jeremy Solbrig
2016-04-20 19:14:45 UTC
Permalink
I am working with a large (30GB) dataset that is three dimensional
(latitude, longitude, and time). I need to be able to access the data in
two primary ways: 1) data for all latitudes and longitudes, but for a
single time, and 2) data for all times, but a single lat/lon pair. When
building the dataset, I am currently allowing pytables to determine the
chunkshape, however, reading the data back becomes incredibly slow, which
is to be expected.

My thought for how best to structure the data for quick access is to
actually output two different sets of data. One with a chunkshape of
[N_lats, N_lons, 1] and one one with a chunkshape of [1, 1, N_times]. Is
there any reason (other than size on disk) that this would become
problematic? Is there a better way to do this?

Thanks,
Jeremy
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Francesc Alted
2016-04-21 08:35:30 UTC
Permalink
Hi Jeremy,

Yeah, having two different datasets with different chunkshapes is, IMO, the
best way to optimize the access to your big dataset. Of course, the size
on disk will double, but other than that I don't see any other problem.

Francesc
Post by Jeremy Solbrig
I am working with a large (30GB) dataset that is three dimensional
(latitude, longitude, and time). I need to be able to access the data in
two primary ways: 1) data for all latitudes and longitudes, but for a
single time, and 2) data for all times, but a single lat/lon pair. When
building the dataset, I am currently allowing pytables to determine the
chunkshape, however, reading the data back becomes incredibly slow, which
is to be expected.
My thought for how best to structure the data for quick access is to
actually output two different sets of data. One with a chunkshape of
[N_lats, N_lons, 1] and one one with a chunkshape of [1, 1, N_times]. Is
there any reason (other than size on disk) that this would become
problematic? Is there a better way to do this?
Thanks,
Jeremy
--
You received this message because you are subscribed to the Google Groups
"pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
Francesc Alted
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jeremy Solbrig
2016-04-21 16:59:06 UTC
Permalink
Thanks Francesc, I appreciate the feedback.
Post by Francesc Alted
Hi Jeremy,
Yeah, having two different datasets with different chunkshapes is, IMO,
the best way to optimize the access to your big dataset. Of course, the
size on disk will double, but other than that I don't see any other problem.
Francesc
Post by Jeremy Solbrig
I am working with a large (30GB) dataset that is three dimensional
(latitude, longitude, and time). I need to be able to access the data in
two primary ways: 1) data for all latitudes and longitudes, but for a
single time, and 2) data for all times, but a single lat/lon pair. When
building the dataset, I am currently allowing pytables to determine the
chunkshape, however, reading the data back becomes incredibly slow, which
is to be expected.
My thought for how best to structure the data for quick access is to
actually output two different sets of data. One with a chunkshape of
[N_lats, N_lons, 1] and one one with a chunkshape of [1, 1, N_times]. Is
there any reason (other than size on disk) that this would become
problematic? Is there a better way to do this?
Thanks,
Jeremy
--
You received this message because you are subscribed to the Google Groups
"pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
<javascript:>.
For more options, visit https://groups.google.com/d/optout.
--
Francesc Alted
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...