I'm reading in a large flatfile which has timestamped data with multiple columns. Data has a boolean column which can be True/False or can have no entry(which evaluates to nan).
When reading the csv the bool column gets typecast as object which prevents saving the data in hdfstore because of serialization error.
example data:
A B C D
a 1 2 true
b 5 7 false
c 3 2 true
d 9 4
I use the following command to read
import pandas as pd
pd.read_csv('data.csv', parse_dates=True)
One solution is to specify the dtype while reading in the csv but I was hoping for a more succinct solution like convert_objects where i can specify parse_numeric or parse_dates.
You can use dtype
, it accepts a dictionary for mapping columns:
dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
import pandas as pd
import numpy as np
import io
# using your sample
csv_file = io.BytesIO('''
A B C D
a 1 2 true
b 5 7 false
c 3 2 true
d 9 4''')
df = pd.read_csv(csv_file, sep=r'\s+', dtype={'D': np.bool})
# then fillna to convert NaN to False
df = df.fillna(value=False)
df
A B C D
0 a 1 2 True
1 b 5 7 False
2 c 3 2 True
3 d 9 4 False
df.D.dtypes
dtype('bool')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With