Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to drop rows of wrong type in python?

I have a data frame like this:

import pandas as pd
test_df = pd.DataFrame({'foo':['1','2','92#']})
test_df

    foo
0   1
1   2
2   92#

I want to convert the type to int64:

test_df.foo.astype('int64')

but I got error message because '92#' can't be convert to int64:

ValueError: invalid literal for int() with base 10: '92#'

So I want to drop all rows that can't convert to int64, and got my result like this:

    foo
0   1
1   2
like image 570
freefrog Avatar asked Jan 29 '23 03:01

freefrog


1 Answers

If you want a solution that applies to the dataFrame as a whole, call pd.to_numeric through apply, and use the resultant mask to drop rows:

test_df[test_df.apply(pd.to_numeric, errors='coerce').notna()].dropna()

  foo
0   1
1   2

This does not modify test_df's values. OTOH, if you want to drop rows while converting values, your solution simplifies:

test_df.apply(pd.to_numeric, errors='coerce').dropna()

   foo
0  1.0
1  2.0

Add an .astype(int) call at the end if you want the result typecast to int64.

like image 140
cs95 Avatar answered Jan 31 '23 08:01

cs95