I'm using the airquality
data set available in R, and attempting to count the number of rows within the data that do not contain any NA
s, while aggregating by Month
.
The data looks like this:
head(airquality)
# Ozone Solar.R Wind Temp Month Day
# 1 41 190 7.4 67 5 1
# 2 36 118 8.0 72 5 2
# 3 12 149 12.6 74 5 3
# 4 18 313 11.5 62 5 4
# 5 NA NA 14.3 56 5 5
# 6 28 NA 14.9 66 5 6
As you can see, I have NA
s in columns Ozone
and Solar.R
. I used the function complete.cases
as follows:
x <- airquality[,1] # for the Ozone
y <- airquality[,2] # for the Solar.R
ok <- complete.cases(x,y)
And then to check:
nrow(airquality)
# [1] 153
sum(!ok)
# [1] 42
sum(ok)
# [1] 111
which is great.
But now, I'd like to pull that data apart to sort by Month
(Column5) and this is where I'm running into problems - in trying to aggregate
or sort
by the value in column5 (Month
).
I was able to get this to run, it won't sort by Month
yet (I just wanted to make sure I could get the function to run):
aggregate(x = sum(complete.cases(airquality)), by= list(nrow(airquality)), FUN = sum)
# Group.1 x
# 1 153 111
OK... so to sort it out. I am trying to use the by
part of the aggregate function to sort. I tried many variations of the column5 within airquality
.
- airquality[,5]
- airquality[,"Month"]
I get these errors:
aggregate(x = sum(complete.cases(airquality)), by= list(airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
aggregate(x = sum(complete.cases(airquality)), by=
list(sum(complete.cases(airquality)),airquality[,5]), FUN = sum)
# Error in aggregate.data.frame(as.data.frame(x), ...) :
# arguments must have same length
I tried to search further into the ?aggregate(x, ...)
function. Namely on the by
part...
by - a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.
I looked up ?factor
, but can't seem to see how to apply it (if even necessary in this case). I also tried putting break =
into it but didn't work.
None of the "Questions that may already have your answer" seem to apply, many of which give solutions in C# and SQL.
Edit: Expected outcome
Count Month
24 5
9 6
26 7
23 8
29 9
As an addition to the other answers, you could do it with dplyr
.
require(dplyr)
airquality %.%
group_by(Month) %.%
summarize(incomplete = sum(!complete.cases(Ozone, Solar.R)),
complete = sum(complete.cases(Ozone, Solar.R)))
# Month incomplete complete
#1 5 7 24
#2 6 21 9
#3 7 5 26
#4 8 8 23
#5 9 1 29
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With