I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.
A <- c(1:10)
B <- c(11:20)
C <- c(21:30)
df<- data.frame(A,B,C)
Can anyone suggest a quick way of doing that?
In summary, we have added NaNs randomly to a Pandas dataframe. We used NumPy’s random module to create a random boolean arrays with approximately specific number of NaNs and Pandas mask fucntion to add NaNs in the dataframe.
Generate Random Integers under a Single DataFrame Column Here is a template that you may use to generate random integers under a single DataFrame column: import numpy as np import pandas as pd data = np.random.randint (lowest integer, highest integer, size=number of random integers) df = pd.DataFrame (data, columns= ['column name']) print (df)
More specifically, you can place np.nan each time you want to add a NaN value in the DataFrame. For example, in the code below, there are 4 instances of np.nan under a single DataFrame column:
Given below are 3 methods to do the same: ravel () function returns contiguous flattened array (1D array with all the input-array elements and with the same type as it). A copy is made only if needed. Choose random indices to Nan value to. Example 2: Adding nan to but using randint function to create data.
You can unlist the data.frame and then take a random sample, then put back in a data.frame.
df <- unlist(df)
n <- length(df) * 0.15
df[sample(df, n)] <- NA
as.data.frame(matrix(df, ncol=3))
It can be done a bunch of different ways using sample().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With