Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

thresh in dropna for DataFrame in pandas in python

Tags:

python

pandas

df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)

It seems that no nan value has been deleted.

    0     1     2
0   0   NaN   NaN
1   3   NaN   NaN
2   6   NaN   8.0
3   9   NaN  11.0
4  12  13.0  14.0

if i run

df1.dropna(thresh=2,axis=1)

why it gives the following?

    0     2
0   0   NaN
1   3   NaN
2   6   8.0
3   9  11.0
4  12  14.0

i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?

like image 697
AAA Avatar asked Jul 29 '18 22:07

AAA


People also ask

What will DF Dropna how =' all ') do?

Pandas DataFrame dropna() Function how: possible values are {'any', 'all'}, default 'any'. If 'any', drop the row/column if any of the values is null. If 'all', drop the row/column if all the values are missing.

How do you get a Dropna in Pandas series?

Pandas Series: dropna() functionThe dropna() function is used to return a new Series with missing values removed. There is only one axis to drop values from. If True, do operation inplace and return None. Whether to perform the operation in place on the data.

How do I remove missing values from a DataFrame in Python?

DataFrame-dropna() function The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values.


1 Answers

thresh=N requires that a column has at least N non-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.

Try setting thresh to 4 to get a better sense of what's happening.

like image 120
DYZ Avatar answered Nov 15 '22 15:11

DYZ