Kevad
2015-09-15 12:45:27 UTC
Hello,
I have some twitter feed loaded in Pandas Series that I would like to
store in HDF5 file. Some of the feeds are norwegian and hence are to be
encoded. Since I am using Python 3.3.x and strings are by default UTF-8
encoded, I guess I need not worry about that (?). Assuming PyTables support
unicode columns (even though they are 'str') in Python 3, I saved them in
HDF5 file. But while loading them back, some of the values are missing.
81 @NRKSigrid @audunlysbakken Har du husket Per S...
82 Lurer på om IS har fått med seg kaoset ved Eur...
83 synes han hÞrte på P3 at Opoku uttales Opoko. ...
84 De statsbÊrende partiene Ap og HÞyre må ta sky...
85 April 2014. Blir MDG det nye arbeider @partiet...
86 MDG: Hasj for kjÞtt. #valg2015
87 GrÞnt skifte.. https://t.co/OuM8quaMz0
88 Kinderegg https://t.co/AsECmw2sV9
89 MDG for honning, frukt og grÞnt. https://t.co/...
Name: feeds, dtype: object
85 April 2014. Blir MDG det nye arbeider @partiet...
Name: feeds, dtype: object
now loaded.
85 April 2014. Blir MDG det nye arbeider @partiet...
86 MDG: Hasj for kjÞtt. #valg2015
Name: feeds, dtype: object
But, the loaded string is missing some characters when compared with the
fremgang. Velgerne har sett at SV og V ikke vinner frem i miljÞspÞrsmål.
fremgang. Velgerne har sett at SV og V ikke vinner frem i miljÞspÞ
Can anyone explain this and let me know how can I avoid it ?
I am using
Pandas: 0.16.2, Python: 3.3.5, PyTables: 3.2.0
Thanks.
I have some twitter feed loaded in Pandas Series that I would like to
store in HDF5 file. Some of the feeds are norwegian and hence are to be
encoded. Since I am using Python 3.3.x and strings are by default UTF-8
encoded, I guess I need not worry about that (?). Assuming PyTables support
unicode columns (even though they are 'str') in Python 3, I saved them in
HDF5 file. But while loading them back, some of the values are missing.
# A sample of the data
feeds[80:90]
80 BÃR MAN STARTE en tweet med store bokstaver? F...feeds[80:90]
81 @NRKSigrid @audunlysbakken Har du husket Per S...
82 Lurer på om IS har fått med seg kaoset ved Eur...
83 synes han hÞrte på P3 at Opoku uttales Opoko. ...
84 De statsbÊrende partiene Ap og HÞyre må ta sky...
85 April 2014. Blir MDG det nye arbeider @partiet...
86 MDG: Hasj for kjÞtt. #valg2015
87 GrÞnt skifte.. https://t.co/OuM8quaMz0
88 Kinderegg https://t.co/AsECmw2sV9
89 MDG for honning, frukt og grÞnt. https://t.co/...
Name: feeds, dtype: object
store = pd.HDFStore('feed.hd5')
store.append('feed', feeds[84:86], min_itemsize=200)
store.close()
pd.read_hdf('feed.hd5', 'feed')
84store.append('feed', feeds[84:86], min_itemsize=200)
store.close()
pd.read_hdf('feed.hd5', 'feed')
85 April 2014. Blir MDG det nye arbeider @partiet...
Name: feeds, dtype: object
feeds[84:86].to_hdf('feed.hd5', 'feed', format='table',
data_columns=True)pd.read_hdf('feed.hd5', 'feed')
But If I change the index to, say, *[84:87]*, the value of *84th* row isnow loaded.
feeds[84:87].to_hdf('feed.hd5', 'feed', format='table',
data_columns=True)res = pd.read_hdf('feed.hd5', 'feed')
res
84 De statsbÊrende partiene Ap og HÞyre må ta sky...res
85 April 2014. Blir MDG det nye arbeider @partiet...
86 MDG: Hasj for kjÞtt. #valg2015
Name: feeds, dtype: object
But, the loaded string is missing some characters when compared with the
# Original tweet (Length: 140)
print (feeds[84])
De statsbÊrende partiene Ap og HÞyre må ta skylda for MiljÞpartietsprint (feeds[84])
fremgang. Velgerne har sett at SV og V ikke vinner frem i miljÞspÞrsmål.
# tweet after loading from HDF5 file (Length: 134)
print (res[84])
De statsbÊrende partiene Ap og HÞyre må ta skylda for MiljÞpartietsprint (res[84])
fremgang. Velgerne har sett at SV og V ikke vinner frem i miljÞspÞ
Can anyone explain this and let me know how can I avoid it ?
I am using
Pandas: 0.16.2, Python: 3.3.5, PyTables: 3.2.0
Thanks.
--
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "pytables-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pytables-users+***@googlegroups.com.
To post to this group, send an email to pytables-***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.