I am trying to find the percentage of NAs in columns as well as inside the whole dataframe:
The first method which I have commented gives me zero and the second method which is not commented gives me a matrix. Not sure what I am missing. Any hint is truly appreciated!
cp.2006<-read.csv(file="cp2006.csv",head=TRUE)
#countNAs <- function(x) {
# sum(is.na(x))
#}
#total=0
#for (i in col(cp.2006)) {
# total=countNAs(i)+total
#}
#print(total)
count<-apply(cp.2006, 1, function(x) sum(is.na(x)))
dims<-dim(cp.2006)
num<-dims[1]*dims[2]
NApercentage<-(count/num) * 100
print(NApercentage)
Answer. NaN is short for Not a Number. NaN indicates that the monitoring system is not receiving any numeric data.
To calculate percent, we need to divide the counts by the count sums for each sample, and then multiply by 100. This can also be done using the function decostand from the vegan package with method = "total" .
E.g. the number of missing data elements for the read variable (cell G6) is 15, as calculated by the formula =COUNT(B4:B23). Since there are 20 rows in the data range the percentage of non-missing cells for read (cell G7) is 15/20 = 75%, which can be calculated by =G6/COUNTA(B4:B23).
x = data.frame(x = c(1, 2, NA, 3), y = c(NA, NA, 4, 5))
For the whole dataframe:
sum(is.na(x))/prod(dim(x))
Or
mean(is.na(x))
For columns:
apply(x, 2, function(col)sum(is.na(col))/length(col))
Or
colMeans(is.na(x))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With