Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset pandas dataframe by dtype [duplicate]

I have a pandas dataframe df with a column, call it A, that contains multiple data types. I want to select all rows of df where A has a particular data type.

For example, suppose that A has types int and str. I want to do something like df[type(df[A])==int] .

like image 866
Jonah Avatar asked Dec 13 '22 14:12

Jonah


1 Answers

Setup

df = pd.DataFrame({'A': ['hello', 1, 2, 3, 'bad']})

This entire column will be assigned dtype Object. If you just want to find numeric values:

pd.to_numeric(df.A, errors='coerce').dropna() 

1    1.0
2    2.0
3    3.0
Name: A, dtype: float64

However, this would also allow floats, string representations of numbers, etc. into the mix. If you really want to find elements that are of type int, you can use a list comprehension:

df.loc[[isinstance(val, int) for val in df.A], 'A']

1    1
2    2
3    3
Name: A, dtype: object

But notice that the dtype is still Object.


If the column has Boolean values, these will be kept, since bool is a subclass of int. If you don't want this behavior, you can use type instead of isinstance

like image 113
user3483203 Avatar answered Jan 11 '23 03:01

user3483203