Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly insert NAs into dataframe proportionaly

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.

A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)

Can anyone suggest a quick way of doing that?

like image 736
Filly Avatar asked Dec 13 '14 00:12

Filly


People also ask

How to add NaNs randomly to a pandas Dataframe?

In summary, we have added NaNs randomly to a Pandas dataframe. We used NumPy’s random module to create a random boolean arrays with approximately specific number of NaNs and Pandas mask fucntion to add NaNs in the dataframe.

How do you generate random integers from a Dataframe?

Generate Random Integers under a Single DataFrame Column Here is a template that you may use to generate random integers under a single DataFrame column: import numpy as np import pandas as pd data = np.random.randint (lowest integer, highest integer, size=number of random integers) df = pd.DataFrame (data, columns= ['column name']) print (df)

How to add a NaN value to a Dataframe?

More specifically, you can place np.nan each time you want to add a NaN value in the DataFrame. For example, in the code below, there are 4 instances of np.nan under a single DataFrame column:

How to add Nan to array in Ravel?

Given below are 3 methods to do the same: ravel () function returns contiguous flattened array (1D array with all the input-array elements and with the same type as it). A copy is made only if needed. Choose random indices to Nan value to. Example 2: Adding nan to but using randint function to create data.


1 Answers

You can unlist the data.frame and then take a random sample, then put back in a data.frame.

df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))

It can be done a bunch of different ways using sample().

like image 68
darwin Avatar answered Oct 07 '22 07:10

darwin