Discussion:
[pytables-users] problem with concurrent reading and pytables
Diego Diaz Dominguez
2015-08-21 18:53:14 UTC
Permalink
Hi all,

I am developing a tool which uses pytables to make queries in a huge
tab-delimited file, stored in hdf5 format. To make the things faster, I
write my python code using the multiprocessing library (to run process in
parallel). Each process makes a query to the same hdf5 file, using the
pytable interface. The query is basically the entries that falls in a range
of values (the file contain coordinates). Then, some stuffs are made with
the values returned. When I run the code with only one process always I get
the same answer (and that is OK), but when I run the program with multiple
process, always I get different values returned by the query.

The following is an example of the number of entries returned by the
pytable query when I run the program when 1 thread and when I run it with
many threads:

*one single process (-p 1)*

python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -p 1 -v 0

number of entries returned by the pytable query: 10


python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -p 1 -v 0

number of entries returned by the pytable query: 10


*multiple processes (-p 4)*

python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0

number of entries returned by the pytable query: 0


python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0

number of entries returned by the pytable query: 37

python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0

number of entries returned by the pytable query: 0

I think the problem can be the concurrence but i am not pretty sure about
that. Any advice in this matter will be appreciated.

Thanks in advance !
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Diego Diaz Dominguez
2015-08-21 23:15:32 UTC
Permalink
Just a quick fact. Previously I was running my code under OSx. Now, for
testing purposes, I ran the code in a linux environment instead and I don't
have any problem (I am sure that the code is the same because I downloaded
the code from my git repository). I think this may be a bug that only
happens on Mac systems.
Post by Diego Diaz Dominguez
Hi all,
I am developing a tool which uses pytables to make queries in a huge
tab-delimited file, stored in hdf5 format. To make the things faster, I
write my python code using the multiprocessing library (to run process in
parallel). Each process makes a query to the same hdf5 file, using the
pytable interface. The query is basically the entries that falls in a range
of values (the file contain coordinates). Then, some stuffs are made with
the values returned. When I run the code with only one process always I get
the same answer (and that is OK), but when I run the program with multiple
process, always I get different values returned by the query.
The following is an example of the number of entries returned by the
pytable query when I run the program when 1 thread and when I run it with
*one single process (-p 1)*
python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -p 1 -v 0
number of entries returned by the pytable query: 10
python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -p 1 -v 0
number of entries returned by the pytable query: 10
*multiple processes (-p 4)*
python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0
number of entries returned by the pytable query: 0
python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0
number of entries returned by the pytable query: 37
python Main.py pred ../tests/test_ref_2.vcf.gz ../tests/test_ind_2.vcf.gz
../tests/SalmoSalar_test.gff3 -u -t 4 -v 0
number of entries returned by the pytable query: 0
I think the problem can be the concurrence but i am not pretty sure about
that. Any advice in this matter will be appreciated.
Thanks in advance !
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...