pandas: read_csv how to force bool data to dtype bool instead of object

Question

I'm reading in a large flatfile which has timestamped data with multiple columns. Data has a boolean column which can be True/False or can have no entry(which evaluates to nan).

When reading the csv the bool column gets typecast as object which prevents saving the data in hdfstore because of serialization error.

example data:

A    B    C    D
a    1    2    true
b    5    7    false
c    3    2    true
d    9    4

I use the following command to read

import pandas as pd
pd.read_csv('data.csv', parse_dates=True)

One solution is to specify the dtype while reading in the csv but I was hoping for a more succinct solution like convert_objects where i can specify parse_numeric or parse_dates.

Anzel · Accepted Answer

You can use dtype, it accepts a dictionary for mapping columns:

dtype : Type name or dict of column -> type
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}

import pandas as pd
import numpy as np
import io

# using your sample
csv_file = io.BytesIO('''
A    B    C    D
a    1    2    true
b    5    7    false
c    3    2    true
d    9    4''')

df = pd.read_csv(csv_file, sep=r'\s+', dtype={'D': np.bool})
# then fillna to convert NaN to False
df = df.fillna(value=False)

df 
   A  B  C      D
0  a  1  2   True
1  b  5  7  False
2  c  3  2   True
3  d  9  4  False

df.D.dtypes
dtype('bool')

pandas: read_csv how to force bool data to dtype bool instead of object

Tags:

python

pandas

Prasanjit Prakash

1 Answers

Anzel

Recent Activity

Donate For Us

pandas: read_csv how to force bool data to dtype bool instead of object

Tags:

python

pandas

Prasanjit Prakash

1 Answers

Anzel

Related questions

Recent Activity

Donate For Us