I have a data frame as taken from SDSS database. Example data is here.
I want to remove the character 'b' from data['class']
. I tried
data['class'] = data['class'].replace("b','')
But I am not getting the result.
To remove characters from columns in Pandas DataFrame, use the replace (~) method. Consider the following DataFrame: df = pd. DataFrame ({"A": ["a","ab","cc"]})
Another option you have when it comes to removing unwanted parts from strings in pandas, is pandas.Series.str.extract () method that is used to extract capture groups in the regex pat as columns in a DataFrame. In our example, we will simply extract the parts of the string we wish to keep:
To remove numbers from string, we can use replace () method and simply replace. Let us first import the require library − Create DataFrame with student records. The Id column is having string with numbers − Remove number from strings of a specific column i.e. “Id” here − dataFrame ['Id'] = dataFrame ['Id'].str. replace ('\d+', '')
When working with pandas we usually need to perform some pre-processing tasks in order to transform the data into a desired form. One common task that is usually required as part of this step involves the transformation of string columns in a way that we eliminate some unwanted parts.
You're working with byte strings. You might consider str.decode
:
data['class'] = data['class'].str.decode('utf-8')
Further explanation:
df = pd.DataFrame([b'123']) # create dataframe with b'' element
Now we can call
df[0].str.decode('utf-8') # returns a pd.series applying decode on str succesfully
df[0].decode('utf-8') # tries to decode the series and throws an error
Basically what you are doing with .str() is applying it for all elements. It could also be written like this:
df[0].apply(lambda x: x.decode('utf-8'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With