Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)

import pandas as pd df = pd.DataFrame(...) df        COLUMN1         .... 0      b'abcde'        .... 1      b'dog'          .... 2      b'cat1'         .... 3      b'bird1'        .... 4      b'elephant1'    .... 

When I access by column with df.COLUMN1, I see Name: COLUMN1, dtype: object

However, if I access by element, it is a "bytes" object

df.COLUMN1.ix[0].dtype Traceback (most recent call last):   File "<stdin>", line 1, in <module> AttributeError: 'bytes' object has no attribute 'dtype' 

How do I convert these into "regular" strings? That is, how can I get rid of this b'' prefix?

like image 911
ShanZhengYang Avatar asked Nov 02 '16 21:11

ShanZhengYang


People also ask

Which Python function is used to convert bytes object to a string?

decode() Function Given a bytes object, you can use the built-in decode() method to convert the byte to a string. You can also pass the encoding type to this function as an argument.

Which method converts raw byte data to string in Python?

String encode() and decode() method provides symmetry whereas bytes() constructor is more object-oriented and readable approach. You can choose any of them based on your preference.

How do you decode bytes in Python?

Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.


1 Answers

You can use vectorised str.decode to decode byte strings into ordinary strings:

df['COLUMN1'].str.decode("utf-8") 

To do this for multiple columns you can select just the str columns:

str_df = df.select_dtypes([np.object]) 

convert all of them:

str_df = str_df.stack().str.decode('utf-8').unstack() 

You can then swap out converted cols with the original df cols:

for col in str_df:     df[col] = str_df[col] 
like image 104
EdChum Avatar answered Oct 08 '22 09:10

EdChum