I have a data frame as taken from SDSS database. Example data is here. <img src="https://i.stack.imgur.com/9fiNu.png" alt="img"> I want to remove the character 'b' from <code>data['class']</code>. I tried <pre class="prettyprint"><code>data['class'] = data['class'].replace("b','') </code></pre> But I am not getting the result.

You're working with byte strings. You might consider <code>str.decode</code>: <pre class="prettyprint"><code>data['class'] = data['class'].str.decode('utf-8') </code></pre>

Removing b'' from string column in a pandas dataframe

Tags:

python

string

pandas

dataframe

I have a data frame as taken from SDSS database. Example data is here.

I want to remove the character 'b' from data['class']. I tried

data['class'] = data['class'].replace("b','')

But I am not getting the result.

583

asked Oct 11 '17 20:10

John Singh

2 Answers

You're working with byte strings. You might consider str.decode:

data['class'] = data['class'].str.decode('utf-8')

answered Oct 10 '22 17:10

cs95

Further explanation:

df = pd.DataFrame([b'123']) # create dataframe with b'' element

Now we can call

df[0].str.decode('utf-8') # returns a pd.series applying decode on str succesfully
df[0].decode('utf-8') # tries to decode the series and throws an error

Basically what you are doing with .str() is applying it for all elements. It could also be written like this:

df[0].apply(lambda x: x.decode('utf-8'))

answered Oct 10 '22 17:10

Anton vBR

Related questions
                            
                                How do i update values in an SQL database? SQLite/Python
                            
                                Using django-admin on windows powershell
                            
                                Logging module does not print in IPython
                            
                                counting the unique items in a numpy array: why is scipy.stats.itemfreq so slow?
                            
                                Make a connection between two computers behind NAT
                            
                                Faster alternatives to numpy.argmax/argmin which is slow
                            
                                Flask change the server header
                            
                                PythonNet FileNotFoundException: Unable to find assembly
                            
                                'WSGIRequest' object has no attribute 'Post' [closed]
                            
                                How to run twisted with flask?
                            
                                What's the difference between (1,) and (1) in Python [duplicate]
                            
                                Invalid block tag. Did you forget to register or load this tag?
                            
                                SQLAlchemy set default value for postgres JSON column
                            
                                Python pairwise comparison of elements in a array or list
                            
                                Intersection of two lists of ranges in Python
                            
                                Calculate the sum of every 5 elements in a python array
                            
                                Django+Apache ModuleNotFoundError: No module named 'myproject'
                            
                                Tensorflow logging messages do not appear
                            
                                " could not find or load spatialindex_c.dll" in windows?
                            
                                Pandas generate columns from single column of strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With