Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The dropna() does not accept the thresh argument and a list passed to the axis argument

Tags:

python

pandas

nan

I am experimenting with dropna(), having read the documentation.

I created a sample dataframe to play with:

        col1    col2    col3    col4
  0      a       1.0    2.0       3
  1      b       NaN    NaN       6
  2      c       NaN    8.0       9
  3      d       NaN    11.0     12
  4      e       13.0   14.0     15
  5      f       17.0   18.0     19
  6      g       21.0   22.0     23

Now the problems I encountered were with the argument thresh and with passing a list to the argument axis.

a) Threshhold is ignored and dropna() is not operating in any row

    df.dropna(thresh=1)

and df.dropna(thresh=2) both return the original df unaltered.

b) List passed to the axis argument indicating that dropna() should operate simultaneously on both axes is ignored. Axis 0 is chosen, which is the default.

   df.dropna(axis=[0,1]) 

returns:

  col1  col2    col3    col4
0   a   1.0      2.0    3
4   e   13.0    14.0    15
5   f   17.0    18.0    19
6   g   21.0    22.0    23

I read carefully the documentation and I researched the subject on stackoverflow but still I can not figure out what I am doing wrong.

Your advice will be appreciated.

like image 248
im7 Avatar asked Sep 12 '25 01:09

im7


2 Answers

dropna is working as expected.

For your first statement a) df.dropna(thresh=1) itearates through all the rows and keeps each row that has at least 1 non-na value. All rows have at least one non-na value so nothing is dropped. The same is true when thresh=2 - all rows have at least 2 non-na values.

For your second question df.dropna(axis=[0,1]) : the order of the list matters. Here, the rows will be dropped first before the columns.

like image 143
Ted Petrou Avatar answered Sep 13 '25 15:09

Ted Petrou


As @Ted Petrou has already mentioned - thresh requires that many non-NA values:

so if we want to select only those rows where we have at least 3 non-NA values:

In [32]: df.dropna(thresh=3)
Out[32]:
  col1  col2  col3  col4
0    a   1.0   2.0     3
2    c   NaN   8.0     9
3    d   NaN  11.0    12
4    e  13.0  14.0    15
5    f  17.0  18.0    19
6    g  21.0  22.0    23

In [33]: df.dropna(thresh=4)
Out[33]:
  col1  col2  col3  col4
0    a   1.0   2.0     3
4    e  13.0  14.0    15
5    f  17.0  18.0    19
6    g  21.0  22.0    23

Internally Pandas will use something similar to the following mask:

In [34]: thresh = 3

In [35]: df.count(1) >= thresh
Out[35]:
0     True
1    False
2     True
3     True
4     True
5     True
6     True
dtype: bool
like image 23
MaxU - stop WAR against UA Avatar answered Sep 13 '25 14:09

MaxU - stop WAR against UA