Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly insert NA's values in a pandas dataframe - with no rows completely missing

How can I randomly make some values missing in a panda dataframe, as in Randomly insert NA's values in a pandas dataframe but make sure no row is set completely with missing values?

Edit: Sorry for not stating this explicitly again (it was in the question I referenced though): I need to be able to specify how much percentage, for example 10%, of the cells is supposed to be NaN (or rather, as close to 10% as can be achieved with the existing data frame's size), as opposed to, say, clearing cells independently with a marginal per-cell probability of 10%.

like image 770
Make42 Avatar asked Feb 06 '23 03:02

Make42


2 Answers

You can use DataFrame.mask and for numpy boolean mask is used answer of this my question:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9]})

print (df)
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

np.random.seed(100)
mask = np.random.choice([True, False], size=df.shape)
print (mask)
[[ True  True False]
 [False False False]
 [ True  True  True]] -> problematic values - all True

mask[mask.all(1),-1] = 0
print (mask)
[[ True  True False]
 [False False False]
 [ True  True False]]

print (df.mask(mask))
     A    B  C
0  NaN  NaN  7
1  2.0  5.0  8
2  NaN  NaN  9
like image 82
jezrael Avatar answered Feb 07 '23 18:02

jezrael


Here is an answer based on Randomly insert NA's values in a pandas dataframe:

replaced = collections.defaultdict(set)
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
random.shuffle(ix)
to_replace = int(round(.1*len(ix)))
for row, col in ix:
    if len(replaced[row]) < df.shape[1] - 1:
        df.iloc[row, col] = np.nan
        to_replace -= 1
        replaced[row].add(col)
        if to_replace == 0:
            break

The shuffle operation will cause random order to the indexes and the if clause will avoid replacing the entire row.

like image 22
AndreyF Avatar answered Feb 07 '23 18:02

AndreyF