Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing b'' from string column in a pandas dataframe

I have a data frame as taken from SDSS database. Example data is here.

img

I want to remove the character 'b' from data['class']. I tried

data['class'] = data['class'].replace("b','')

But I am not getting the result.

like image 583
John Singh Avatar asked Oct 11 '17 20:10

John Singh


People also ask

How to remove characters from columns in pandas Dataframe?

To remove characters from columns in Pandas DataFrame, use the replace (~) method. Consider the following DataFrame: df = pd. DataFrame ({"A": ["a","ab","cc"]})

How to remove unwanted parts from strings in pandas?

Another option you have when it comes to removing unwanted parts from strings in pandas, is pandas.Series.str.extract () method that is used to extract capture groups in the regex pat as columns in a DataFrame. In our example, we will simply extract the parts of the string we wish to keep:

How to remove numbers from string in Python Dataframe?

To remove numbers from string, we can use replace () method and simply replace. Let us first import the require library − Create DataFrame with student records. The Id column is having string with numbers − Remove number from strings of a specific column i.e. “Id” here − dataFrame ['Id'] = dataFrame ['Id'].str. replace ('\d+', '')

Why do we need to transform string columns in pandas?

When working with pandas we usually need to perform some pre-processing tasks in order to transform the data into a desired form. One common task that is usually required as part of this step involves the transformation of string columns in a way that we eliminate some unwanted parts.


2 Answers

You're working with byte strings. You might consider str.decode:

data['class'] = data['class'].str.decode('utf-8') 
like image 72
cs95 Avatar answered Oct 10 '22 17:10

cs95


Further explanation:

df = pd.DataFrame([b'123']) # create dataframe with b'' element

Now we can call

df[0].str.decode('utf-8') # returns a pd.series applying decode on str succesfully
df[0].decode('utf-8') # tries to decode the series and throws an error

Basically what you are doing with .str() is applying it for all elements. It could also be written like this:

df[0].apply(lambda x: x.decode('utf-8')) 
like image 1
Anton vBR Avatar answered Oct 10 '22 17:10

Anton vBR