I am trying to fill a pandas dataframe NAN using random data of every column, and that random data appears in every column depeding on its frecuency. I have this:
def MissingRandom(dataframe):
import random
dataframe = dataframe.apply(lambda x: x.fillna(
random.choices(x.value_counts().keys(),
weights = list(x.value_counts()))[0]))
return dataframe
I get the DataFrame filled in with random data but its the same data for all the missing data of the column. I would like this data to be different for every missing of the column but I am not able to do it. Could anybody help me?
Thank you very much
Please see below my solution. Firstly i created a function that fills a series based on your criteria (frequencies as weights in the random function) and finally, we apply this function to all clumns of the dataframe:
from collections import Counter
def fillcolumn(ser):
cna=len(ser[ser.isna()])
l=ser[ser.notna()]
d=Counter(l)
m=random.choices(list(d.keys()), weights = list(d.values()), k=cna)
ser[ser.isna()]=m
return ser
for i in df.columns:
df[i]=fillcolumn(df[i])
Your full code:
def MissingRandom(dataframe):
import random
from collections import Counter
def fillcolumn(ser):
cna=len(ser[ser.isna()])
l=ser[ser.notna()]
d=Counter(l)
m=random.choices(list(d.keys()), weights = list(d.values()), k=cna)
ser[ser.isna()]=m
return ser
for i in dataframe.columns:
dataframe[i]=fillcolumn(dataframe[i])
return dataframe
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With