In my application I load text files that are structured as follows: <ul> <li>First non numeric column (ID)</li> <li>A number of non-numeric columns (strings)</li> <li>A number of numeric columns (floats)</li> </ul> The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this: <pre class="prettyprint"><code>source = pandas.read_table(inputfile, index_col=0) </code></pre> I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want. Ran into it on this post on the exact same thing. Or in your case, specifically: <code>source.select_dtypes(['number']) or source.select_dtypes([np.number]</code>

Drop non-numeric columns from a pandas DataFrame [duplicate]

Tags:

python

pandas

In my application I load text files that are structured as follows:

First non numeric column (ID)
A number of non-numeric columns (strings)
A number of numeric columns (floats)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

662

asked Oct 04 '12 10:10

Einar

2 Answers

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

answered Sep 25 '22 08:09

sapo_cosmico

It`s a private method, but it will do the trick: source._get_numeric_data()

In [2]: import pandas as pd  In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})  In [4]: source Out[4]:      A  B       C 0  foo  1  (1, 2) 1  bar  2  (3, 4)  In [5]: source._get_numeric_data() Out[5]:    B 0  1 1  2

answered Sep 25 '22 08:09

Wouter Overmeire

Related questions
                            
                                Numpy remove a dimension from np array
                            
                                Encoding nested python object in JSON
                            
                                UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-6: invalid data
                            
                                Why does concatenation of DataFrames get exponentially slower?
                            
                                How to iterate over the file in python
                            
                                Python, Overriding an inherited class method
                            
                                How to access data when form.is_valid() is false
                            
                                How to set another Inline title in Django Admin?
                            
                                Python Script to convert Image into Byte array
                            
                                Difference between "fill" and "expand" options for tkinter pack method
                            
                                How can I select all rows with sqlalchemy?
                            
                                Editing django-rest-framework serializer object before save
                            
                                Grouping Python dictionary keys as a list and create a new dictionary with this list as a value
                            
                                iterating quickly through list of tuples
                            
                                How do I run uwsgi with virtualenv
                            
                                How to detect lines in OpenCV?
                            
                                Getting model attributes from pipeline
                            
                                Stratified Sampling in Pandas
                            
                                Mapping a NumPy array in place
                            
                                Can one partially apply the second argument of a function that takes no keyword arguments?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With