Pandas to_csv prefixing 'b' when doing .astype('|S') on column

Question

I'm following advice of this article to reduce Pandas DataFrame memory usage, I'm using .astype('|S') on an object column like so:

data_frame['COLUMN1'] = data_frame['COLUMN1'].astype('|S')
data_frame['COLUMN2'] = data_frame['COLUMN2'].astype('|S')

Performing this on the DataFrame cuts memory usage by 20-40% without negative impacts on processing the columns. However, when outputting the file using .to_csv():

data_frame.to_csv(filename, sep='	', encoding='utf-8')

The columns with .astype('|S') are outputted with a prefix of b with single quotes:

b'00001234'  b'Source'

Removing the .astype('|S') call and outputting to csv gives the expected behavior:

00001234  Source

Some googling on this issue does find GitHub issues, but I don't think they are related (looks like they were fixed as well): to_csv and bytes on Python 3, BUG: Fix default encoding for CSVFormatter.save

I'm on Python 3.6.4 and Pandas 0.22.0. I tested the behavior is consistent on both MacOS and Windows. Any advice on how to output the columns without the b prefix and single quotes?

Milton Arango G · Accepted Answer

The 'b' prefix indicates a Python 3 bytes literal that represents an object rather than an unicode string. So if you want to remove the prefix you could decode the bytes object using the string decode method before saving it to a csv file:

data_frame['COLUMN1'] = data_frame['COLUMN1'].apply(lambda s: s.decode('utf-8'))

Pandas to_csv prefixing 'b' when doing .astype('|S') on column

Tags:

python

pandas

Brett VanderHaar

1 Answers

Milton Arango G

Recent Activity

Donate For Us

Pandas to_csv prefixing 'b' when doing .astype('|S') on column

Tags:

python

pandas

Brett VanderHaar

1 Answers

Milton Arango G

Related questions

Recent Activity

Donate For Us