I have a dataframe which looks like this: <pre class="prettyprint"><code> A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657 </code></pre> I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like: <pre class="prettyprint"><code> A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657 </code></pre> I tried using the following but get the error expected string or buffer: <pre class="prettyprint"><code>import re dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0) </code></pre> What should I do instead? Code to create dataframe: <pre class="prettyprint"><code>dfObject = pd.DataFrame() dfObject.set_value(1, 'A', 'red78') dfObject.set_value(1, 'B', 'square') dfObject.set_value(1, 'C', 'big235') dfObject.set_value(2, 'A', 'green') dfObject.set_value(2, 'B', 'circle') dfObject.set_value(2, 'C', 'small123') dfObject.set_value(3, 'A', 'blue45') dfObject.set_value(3, 'B', 'triangle') dfObject.set_value(3, 'C', 'big657') </code></pre>

You can use <code>.str.replace</code> with a regex: <pre class="prettyprint"><code>dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'') </code></pre> output: <pre class="prettyprint"><code> A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657 </code></pre>

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

Tags:

python

pandas

dataframe

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

461

asked May 22 '17 15:05

ag14

3 Answers

Use str.extract and pass a regex pattern to extract just the numeric parts:

In[40]:
dfObject['C'] = dfObject['C'].str.extract('(\d+)', expand=False)
dfObject

Out[40]: 
        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

If needed you can cast to int:

dfObject['C'] = dfObject['C'].astype(int)

answered Oct 18 '22 02:10

EdChum

To remove all non-digit characters from strings in a Pandas column you should use str.replace with \D+ or [^0-9]+ patterns:

dfObject['C'] = dfObject['C'].str.replace(r'\D+', '')

Or, since in Python 3, \D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ۱۲۳۴۵۶۷۸۹, see proof) you should consider

dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')

So,

import re
print ( re.sub( r'\D+', '', '1۱۲۳۴۵۶۷۸۹0') )         # => 1۱۲۳۴۵۶۷۸۹0
print ( re.sub( r'[^0-9]+', '', '1۱۲۳۴۵۶۷۸۹0') )     # => 10

answered Oct 18 '22 02:10

Wiktor Stribiżew

You can use .str.replace with a regex:

dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')

output:

        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

answered Oct 18 '22 02:10

Scott Boston

Related questions
                            
                                How do you run a python script from within notepad++? [duplicate]
                            
                                Tool for pinpointing circular imports in Python/Django?
                            
                                Django 1.7 - How do I suppress "(1_6.W001) Some project unittests may not execute as expected."?
                            
                                ProgrammingError: relation "django_session" does not exist error after installing Psycopg2
                            
                                How can I create a dropdown menu from a List in Tkinter?
                            
                                How to create a stacked bar chart for my DataFrame using seaborn [duplicate]
                            
                                Pandas query function not working with spaces in column names
                            
                                How to use dynamic foreignkey in Django?
                            
                                Handling \r\n vs \n newlines in python on Mac vs Windows
                            
                                Turn off caching of static files in Django development server
                            
                                How to install matplotlib with Python3.2
                            
                                sorting a counter in python by keys
                            
                                Insert a link inside a Pandas table
                            
                                Get the description of a status code in Python Requests
                            
                                Idiomatic way to do list/dict in Cython?
                            
                                How to store the result of an executed shell command in a variable in python? [duplicate]
                            
                                Difference between "findAll" and "find_all" in BeautifulSoup
                            
                                Python/PIL Resize all images in a folder
                            
                                Filter out rows based on list of strings in Pandas
                            
                                Add Multiple Columns to Pandas Dataframe from Function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

Tags:

python

pandas

dataframe

ag14

People also ask

3 Answers

EdChum

Wiktor Stribiżew

Scott Boston

Recent Activity

Donate For Us