Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

drop rows with errors for pandas data coercion

I have a dataframe, for which I need to convert columns to floats and ints, that has bad rows, ie., values that are in a column that should be a float or an integer are instead string values.

If I use df.bad.astype(float), I get an error, this is expected.

If I use df.bad.astype(float, errors='coerce'), or pd.to_numeric(df.bad, errors='coerce'), bad values are replaced with np.NaN, also according to spec and reasonable.

There is also errors='ignore', another option that ignores the errors and leaves the erroring values alone.

But actually, I want to not ignore the errors, but drop the rows with bad values. How can I do this?

I can ignore the errors and do some type checking, but that's not an ideal solution, and there might be something more idiomatic to do this.

Example

test = pd.DataFrame(["3", "4", "problem"], columns=["bad"])
test.bad.astype(float) ## ValueError: could not convert string to float: 'problem'

I want something like this:

pd.to_numeric(df.bad, errors='drop')

And this returns dataframe with only the 2 good rows.

like image 754
Gijs Avatar asked Jul 11 '16 10:07

Gijs


People also ask

How do you drop rows according to conditions in pandas?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do you drop rows which has NULL values in pandas?

Pandas DataFrame dropna() FunctionIf 0, drop rows with null values. If 1, drop columns with missing values. how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null.

How do you drop rows in pandas based on row value?

Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.

How do you drop a set of rows in pandas?

You can delete a list of rows from Pandas by passing the list of indices to the drop() method. In this code, [5,6] is the index of the rows you want to delete. axis=0 denotes that rows should be deleted from the dataframe.


1 Answers

Since the bad values are replaced with np.NaN would it not be simply just df.dropna() to get rid of the bad rows now?

EDIT: Since you need to not drop the initial NaNs, maybe you could use df.fillna() prior to using pd.to_numeric

like image 174
SerialDev Avatar answered Sep 21 '22 03:09

SerialDev