Discussion:
[pytables-users] Grants up to $3k available to NumFOCUS projects (sponsored & affiliated)
Francesc Alted
2017-03-29 19:16:00 UTC
Permalink
Hi,

Just a short note to tell you that I received the baton from Anthony
Scopatz as Point Of Contact for the PyTables project in the NumFOCUS
foundation. From my new position, and after talking with Gina HelFrich,
the Communications Director of NumFOCUS, we decided that it would be good
to advertise at least a couple of PyTables-related projects that can
benefit from the recent grant program of NumFOCUS (see the message below).

Below you can find the projects that I sketched out and that Gina and I
think they could benefit from the grant. IMO this would be interesting for
anyone (most specially students) with some experience with the PyTables
project and getting some money along the way (it could be a perfect summer
job for some of you). I am volunteering to tutor the job (for a maximum of
1 or 2 hours a week), so you won't be on your own.

Enjoy data!

---------- Forwarded message ----------
From: *Gina Helfrich* <***@numfocus.org>
Date: Fri, Mar 17, 2017 at 10:51 AM
Subject: Re: [numfocus] Grants up to $3k available to NumFOCUS projects
(sponsored & affiliated)
To: ***@googlegroups.com, ***@numfocus.org


There is no specific template, but proposals should be kept under 2 pages.
Maximum 1 submission per project.

Required elements of the proposal are:

- title
- project description
- benefit to project/community
- project team
- and budget

Submit proposals to ***@numfocus.org

Best,
Gina


PyTables projects for NumFOCUS grants
=====================================

Add h5py as a backend to PyTables
——————————————————

The goal here is to define a new way to access I/O that would allow a new
version of PyTables (probably v4.x) to use different backends. As h5py is
a great interface for HDF5 the main priority is for interfacing h5py so as
to allow HDF5 access through it. This way PyTables can leverage h5py to
access the most advanced features of HDF5 while still delivering features
like advanced table management, fast table queries and easy access to
advanced Blosc meta-compressors (and with it, to a wide array of codecs,
like LZ4, Snappy and Zstandard). You can see a more detailed blog about
our vision here:

https://hdfgroup.org/wp/2015/09/python-hdf5-a-vision/

In fact, work has already started on that front: in August 2016 a handful
of PyTables core developers gathered with the goal to start this precise
task, and although we certainly did a lot progress on the Table object (the
fundamental one in PyTables), there is still quite a bit of work to do.
See the final report that we wrote at the end of our hackfest in Perth:

https://github.com/PyTables/PyTables/blob/pt4/doc/New-Backend-Interface.rst

The idea is to continue the job done till now and release an alpha release
with the basic Table, CArray, EArray and VLarray objects working, and hope
to get some traction for promptly releasing a stable version unifying the
best of PyTables and h5py packages.


Implement column-wise tables in PyTables
————————————————————————————————

Both NumPy and PyTables implement tables that are arranged row-wise. This
approach works well for performing lookups on top of very large tables with
a relatively small record size (typically <= 100 bytes).

However, there are a lot of user cases where a column-wise arrangement
would be more beneficial, like for example being able to add or remove
columns efficiently, having tables with a much larger record size, columns
that could have a variable length elements (varchar, ragged arrays or
BLOBs) and better compression performance (based in the fact that entropy
varies less in the elements along a column than the ones in a row).

We have a more complete report about this, and the work to be done, here:

https://github.com/PyTables/proposal/blob/master/column-wise-pytables.rst

Many areas would benefit from this, but specially the pandas community
because this should allow to represent DataFrames on disk that are closer
to its representation in memory, opening the door to a more faithful
disk-mapped representation of the DataFrame. Also, column-wise tables can
find a good niche in the financial world, where access to columns is many
times preferred (because of efficiency reasons) than access to whole rows.


​--
Francesc Alted
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...