Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Delete Rows with only NaN values

I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.

I tried using the dropna function several ways but it seems clear that it greedily deletes columns or rows that contain any NaN values.

This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple

df.drop(rows)

I know I can count non-null values using the count function which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.

Here's some pseudo-code that I think is on the right track:

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.

like image 337
Slavatron Avatar asked Aug 05 '14 18:08

Slavatron


People also ask

How do I remove all rows with NaN values?

To drop all the rows with the NaN values, you may use df. dropna().

How do I remove rows with NaN in Python?

Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.

How do you drop the rows with missing values in pandas?

The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value.

How do you delete null rows in pandas?

Drop all rows having at least one null value When it comes to dropping null values in pandas DataFrames, pandas. DataFrame. dropna() method is your friend.


1 Answers

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:

df.dropna(thresh=(len(df) - 7))

See the docs

like image 193
EdChum Avatar answered Sep 20 '22 09:09

EdChum