I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data. <pre class="prettyprint"><code>A <- c(1:10) B <- c(11:20) C <- c(21:30) df<- data.frame(A,B,C) </code></pre> Can anyone suggest a quick way of doing that?

You can unlist the data.frame and then take a random sample, then put back in a data.frame. <pre class="prettyprint"><code>df <- unlist(df) n <- length(df) * 0.15 df[sample(df, n)] <- NA as.data.frame(matrix(df, ncol=3)) </code></pre> It can be done a bunch of different ways using sample().

Randomly insert NAs into dataframe proportionaly

Tags:

dataframe

r

missing-data

na

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.

A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)

Can anyone suggest a quick way of doing that?

736

asked Dec 13 '14 00:12

Filly

1 Answers

You can unlist the data.frame and then take a random sample, then put back in a data.frame.

df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))

It can be done a bunch of different ways using sample().

answered Oct 07 '22 07:10

darwin

Related questions
                            
                                Harvey balls in R
                            
                                Why does expand.grid ignore options?
                            
                                Stemming with R Text Analysis
                            
                                sliderInput for date
                            
                                R histogram with multiple populations
                            
                                Median of pandas dataframe column
                            
                                Can R help manuals have latex math in them?
                            
                                r - Filter rows that contain a string from a vector
                            
                                Predicted vs. Actual plot
                            
                                Is is possible to convert a dataframe object to a tribble constructor?
                            
                                Efficient string similarity grouping
                            
                                In R, sample n rows from a df in which a certain column has non-NA values (sample conditionally)
                            
                                How do I plot a classification graph of a SVM in R
                            
                                How to define argument types for R functions?
                            
                                Pivot Table-like Output in R?
                            
                                How to search for multiple strings and replace them with nothing within a list of strings
                            
                                How to add row on-top of data frame R
                            
                                How do I convert date to number of days in R
                            
                                The reverse/inverse of the normal distribution function in R
                            
                                R: fastest way to extract all substrings contained between two substrings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With