I have a pandas data frame where there are a several missing values. I noticed that the non missing values are close to each other. Thus, I would like to impute the missing values by randomly choosing the non missing values.
For instance:
import pandas as pd
import random
import numpy as np
foo = pd.DataFrame({'A': [2, 3, np.nan, 5, np.nan], 'B':[np.nan, 4, 2, np.nan, 5]})
foo
A B
0 2 NaN
1 3 4
2 NaN 2
3 5 NaN
4 NaN 5
I would like for instance foo['A'][2]=2
and foo['A'][5]=3
The shape of my pandas DataFrame is (6940,154).
I try this
foo['A'] = foo['A'].fillna(random.choice(foo['A'].values.tolist()))
But it not working. Could you help me achieve that? Best regards.
notna() function detects existing/ non-missing values in the dataframe. The function returns a boolean object having the same size as that of the object on which it is applied, indicating whether each individual value is a na value or not.
Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.
This works well for me on Pandas DataFrame
def randomiseMissingData(df2):
"randomise missing data for DataFrame (within a column)"
df = df2.copy()
for col in df.columns:
data = df[col]
mask = data.isnull()
samples = random.choices( data[~mask].values , k = mask.sum() )
data[mask] = samples
return df
I did this for filling NaN values with a random non-NaN value:
import random
df['column'].fillna(random.choice(df['column'][df['column'].notna()]), inplace=True)
You can use pandas.fillna
method and the random.choice
method to fill the missing values with a random selection of a particular column.
import random
import numpy as np
df["column"].fillna(lambda x: random.choice(df[df[column] != np.nan]["column"]), inplace =True)
Where column is the column you want to fill with non nan
values randomly.
This is another approach to this question after making improvement on the first answer and according to how to check if an numpy int is nand found here in numpy documentation
foo['A'].apply(lambda x: np.random.choice([x for x in range(min(foo['A']),max(foo['A'])]) if (np.isnan(x)) else x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With