Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly insert NA's values in a pandas dataframe

How can I randomly insert np.nan's in a DataFrame ? Let's say I want 10% null values inside my DataFrame.

My data looks like this :

df = pd.DataFrame(np.random.randn(5, 3),                    index=['a', 'b', 'c', 'd', 'e'],                   columns=['one', 'two', 'three'])          one       two     three a  0.695132  1.044791 -1.059536 b -1.075105  0.825776  1.899795 c -0.678980  0.051959 -0.691405 d -0.182928  1.455268 -1.032353 e  0.205094  0.714192 -0.938242 

Is there an easy way to insert the null values?

like image 880
mitsi Avatar asked Aug 20 '16 14:08

mitsi


People also ask

How do I randomly drop rows in pandas?

To remove rows at random without shuffling in Pandas DataFrame: Get an array of randomly selected row index labels. Use the drop(~) method to remove the rows.


1 Answers

Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).

import random ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])] for row, col in random.sample(ix, int(round(.1*len(ix)))):     df.iat[row, col] = np.nan 

Here's a way to clear cells independently with a per-cell probability of 10%.

df = df.mask(np.random.random(df.shape) < .1) 
like image 98
Kodiologist Avatar answered Sep 19 '22 20:09

Kodiologist