I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x) <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame(...) df COLUMN1 .... 0 b'abcde' .... 1 b'dog' .... 2 b'cat1' .... 3 b'bird1' .... 4 b'elephant1' .... </code></pre> When I access by column with <code>df.COLUMN1</code>, I see <code>Name: COLUMN1, dtype: object</code> However, if I access by element, it is a "bytes" object <pre class="prettyprint"><code>df.COLUMN1.ix[0].dtype Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'bytes' object has no attribute 'dtype' </code></pre> How do I convert these into "regular" strings? That is, how can I get rid of this <code>b''</code> prefix?

You can use vectorised <code>str.decode</code> to decode byte strings into ordinary strings: <pre class="prettyprint"><code>df['COLUMN1'].str.decode("utf-8") </code></pre> To do this for multiple columns you can select just the str columns: <pre class="prettyprint"><code>str_df = df.select_dtypes([np.object]) </code></pre> convert all of them: <pre class="prettyprint"><code>str_df = str_df.stack().str.decode('utf-8').unstack() </code></pre> You can then swap out converted cols with the original df cols: <pre class="prettyprint"><code>for col in str_df: df[col] = str_df[col] </code></pre>

How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

import pandas as pd df = pd.DataFrame(...) df        COLUMN1         .... 0      b'abcde'        .... 1      b'dog'          .... 2      b'cat1'         .... 3      b'bird1'        .... 4      b'elephant1'    ....

When I access by column with df.COLUMN1, I see Name: COLUMN1, dtype: object

However, if I access by element, it is a "bytes" object

df.COLUMN1.ix[0].dtype Traceback (most recent call last):   File "<stdin>", line 1, in <module> AttributeError: 'bytes' object has no attribute 'dtype'

How do I convert these into "regular" strings? That is, how can I get rid of this b'' prefix?

911

asked Nov 02 '16 21:11

ShanZhengYang

1 Answers

You can use vectorised str.decode to decode byte strings into ordinary strings:

df['COLUMN1'].str.decode("utf-8")

To do this for multiple columns you can select just the str columns:

str_df = df.select_dtypes([np.object])

convert all of them:

str_df = str_df.stack().str.decode('utf-8').unstack()

You can then swap out converted cols with the original df cols:

for col in str_df:     df[col] = str_df[col]

104

answered Oct 08 '22 09:10

EdChum

Related questions
                            
                                How to get the caller class name inside a function of another class in python?
                            
                                Convert set to string and vice versa
                            
                                Read stdin as binary [duplicate]
                            
                                Pyspark: show histogram of a data frame column
                            
                                After installing anaconda - command not found: jupyter
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                Illegal instruction (core dumped) after running import tensorflow
                            
                                Python module to shellquote/unshellquote? [duplicate]
                            
                                You are not allowed to edit '...' package information
                            
                                Print a float number in normal form, not exponential form / scientific notation [duplicate]
                            
                                How to configure Logging in Python
                            
                                Saving Image with PIL
                            
                                Python writelines() and write() huge time difference
                            
                                Running subprocess within different virtualenv with python
                            
                                passing data to subprocess.check_output
                            
                                matplotlib savefig() size control
                            
                                How to install sklearn? [closed]
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                How to remove decimal points in pandas
                            
                                Python - How NOT to sort Sphinx output in alphabetical order

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

Tags:

python

arrays

python-3.x

pandas

byte

ShanZhengYang

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us