Discussion:
[pytables-users] Table.append problem
Thomas Marsh
2018-05-05 23:06:28 UTC
Permalink
Hello,

I am having a problem I just can't solve. I am trying to concatenate a
large number of tables each with the same set of columns, which when
combined will add to ~1.6 billion rows. I am reading them in one by one and
using tables.Table.append to add each one in to the growing table. However
when I use tables.Table.append, the data of all but the initial table are
corrupted. Here is a stripped down version which shows the problem. If
someone can see what I am doing wrong I would be very grateful. I am
running tables 3.2.0 under python 3.4.6.

first the code:

-----------------------------------------------------------------------------------------
import numpy as np
import tables

# Construct a numpy.recarray with one row of two elements

row1 = [(3572145, 5.46789e5)]
dt = np.dtype([('a', '>i4'), ('b', '>f4')])
arr = np.rec.array(row1, dt)

# print it
print('arr =',arr)

# open a tables file
h5file = tables.open_file('bug.h5',mode='w',title='Bug test')

# add a table based on the recarray just created
table = h5file.create_table(h5file.root, 'bug', arr, 'bug')

# append the same array to create a second row
table.append(arr)
table.flush()

# close the table
h5file.close()

# open it
h5file = tables.open_file('bug.h5')

# print the 'a' and 'b' columns: they should each have a pair of identical
values
# since the same row was repeated twice
tab = h5file.root.bug
print('a =',tab.col('a'))
print('b =',tab.col('b'))

h5file.close()
--------------------------------------------------------------------------

and then the output:

arr = [(3572145, 546789.)]
a = [ 3572145 -1316932096]
b = [ 5.46789000e+05 1.70470369e+10]

why isn't a = [3572145 3572145], and similarly why isn't b = [
5.46789000e+05 5.46789000e+05]

Tom
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Javier Sancho
2018-05-06 14:20:23 UTC
Permalink
Post by Thomas Marsh
dt = np.dtype([('a', '>i4'), ('b', '>f4')])
You are using big-endian here, but look at what happens when you create
the table:

In [55]: dt
Out[55]: dtype([('a', '>i4'), ('b', '>f4')])

In [56]: tab._v_dtype
Out[56]: dtype([('a', '<i4'), ('b', '<f4')])

PyTables transforms datatypes to little-endian. It looks that the
numbers aren't modified when you populate the table directly with
create_table, but append method is affected undoubtedly.

If you create your array using little-endian, it runs without problem.

An issue in github (and a patch?) would be great ;-)
--
Javier Sancho Fernández - http://www.jsancho.org/
Associate Member of the Free Software Foundation - http://www.fsf.org/
Contra el DRM - http://www.defectivebydesign.org/what_is_drm
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thomas Marsh
2018-05-06 21:21:27 UTC
Permalink
Ah, well spotted. I didn't realise that Pytables transformed to
little-endian. Do I understand correctly that it is the append method that
isn't working here? The tables I am reading are FITS format and big-endian
so I had better convert them. I will endevour to add something to github
once I have things working; not sure I am sufficiently up-to-speed with
PyTables yet to dare offer a patch,

thanks for diagnosing this,

tom
Post by Javier Sancho
Post by Thomas Marsh
dt = np.dtype([('a', '>i4'), ('b', '>f4')])
You are using big-endian here, but look at what happens when you create
In [55]: dt
Out[55]: dtype([('a', '>i4'), ('b', '>f4')])
In [56]: tab._v_dtype
Out[56]: dtype([('a', '<i4'), ('b', '<f4')])
PyTables transforms datatypes to little-endian. It looks that the
numbers aren't modified when you populate the table directly with
create_table, but append method is affected undoubtedly.
If you create your array using little-endian, it runs without problem.
An issue in github (and a patch?) would be great ;-)
--
Javier Sancho Fernández - http://www.jsancho.org/
Associate Member of the Free Software Foundation - http://www.fsf.org/
Contra el DRM - http://www.defectivebydesign.org/what_is_drm
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...