I want to delete all columns or rows with more than 50% NA
s in a data frame.
This is my solution:
# delete columns with more than 50% missings miss <- c() for(i in 1:ncol(data)) { if(length(which(is.na(data[,i]))) > 0.5*nrow(data)) miss <- append(miss,i) } data2 <- data[,-miss] # delete rows with more than 50% percent missing miss2 <- c() for(i in 1:nrow(data)) { if(length(which(is.na(data[i,]))) > 0.5*ncol(data)) miss2 <- append(miss2,i) } data <- data[-miss,]
but I'm looking for a nicer/faster solution.
I would also appreciate a dplyr
solution
In the case of multivariate analysis, if there is a larger number of missing values, then it can be better to drop those cases (rather than do imputation) and replace them. On the other hand, in univariate analysis, imputation can decrease the amount of bias in the data, if the values are missing at random.
The dropna() function is used to remove missing values. Determine if rows or columns which contain missing values are removed. 0, or 'index' : Drop rows which contain missing values.
The overall percentage of data that is missing is important. Generally, if less than 5% of values are missing then it is acceptable to ignore them (REF). However, the overall percentage missing alone is not enough; you also need to pay attention to which data is missing.
DataFrame. dropna() also gives you the option to remove the rows by searching for null or missing values on specified columns. To search for null values in specific columns, pass the column names to the subset parameter. It can take a list of column names or column positions.
To remove columns with some amount of NA, you can use colMeans(is.na(...))
## Some sample data set.seed(0) dat <- matrix(1:100, 10, 10) dat[sample(1:100, 50)] <- NA dat <- data.frame(dat) ## Remove columns with more than 50% NA dat[, which(colMeans(!is.na(dat)) > 0.5)] ## Remove rows with more than 50% NA dat[which(rowMeans(!is.na(dat)) > 0.5), ] ## Remove columns and rows with more than 50% NA dat[which(rowMeans(!is.na(dat)) > 0.5), which(colMeans(!is.na(dat)) > 0.5)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With