I am trying to select rows by specifying the value of one of the columns. That works perfectly well, as long as the value selected is pure ascii. If however, it contains non-ascii characters, I cannot get it to work no matter how I encode the value.
Simplified example to illustrate the problem:
>>> from __future__ import (absolute_import, division,
print_function, unicode_literals)
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 'Stuttgart'], [2, 'München']], columns=['id', 'city'])
>>> df['city'] = df['city'].map(lambda x: x.encode('latin-1'))
>>> store = pd.HDFStore('test_store.h5')
>>> store.append('test_key', df, data_columns=True)
>>> store['test_key']
id city
0 1 Stuttgart
1 2 M�nchen
Note that the non-asci string is indeed properly stored:
>>> store['test_key']['city'][1]
'M\xfcnchen'
Selecting for asci value works just fine:
>>> store.select('test_key', where='city==%r' % 'Stuttgart')
id city
0 1 Stuttgart
But selecting for the non-ascii value fails to return the row:
>>> store.select('test_key', where='city==%r' % 'München')
Empty DataFrame
Columns: [id, city]
Index: []
>>> store.select('test_key', where='city==%r' % 'München'.encode('latin-1'))
Empty DataFrame
Columns: [id, city]
Index: []
Clearly I am doing something wrong... How does one solve this issue?
Oddly, selection seems to work fine if the encoding is utf-8 instead of latin-1:
from __future__ import (absolute_import, division,
print_function, unicode_literals)
import pandas as pd
df = pd.DataFrame([[1, 'Stuttgart'], [2, 'München']], columns=['id', 'city'])
df['city'] = df['city'].map(lambda x: x.encode('utf-8'))
store = pd.HDFStore('/tmp/test_store.h5', 'w')
store.append('test_key', df, data_columns=True)
print(store.select('test_key', where='city==%r' % 'Stuttgart'.encode('utf-8')))
# id city
# 0 1 Stuttgart
print(store.select('test_key', where='city==%r' % 'München'.encode('utf-8')))
# id city
# 1 2 München
store.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With