I have an R data frame with the following format:
column1 column2
NA NA
1 A
1 A
1 A
NA NA
NA NA
2 B
2 B
NA NA
NA NA
3 A
3 A
3 A
df = structure(list(column1 = c(NA, 1L, 1L, 1L, NA, NA, 2L, 2L, NA,
NA, 3L, 3L, 3L), column2 = c(NA, "A", "A", "A", NA, NA, "B",
"B", NA, NA, "A", "A", "A")), .Names = c("column1", "column2"
), row.names = c(NA, -13L), class = "data.frame")
If the row in one column has an NA, the other column has an NA.
The numerical value in column1 describes a unique group, e.g. rows 2-4 have the group 1. The column column2 describes the identity of this grouping. In this data frame, the identity is either A, B, C, or D.
My goal is to tally the number of identities by group within the entire data frame: how many A groups there are, how many B groups, etc.
The correct output for this file (so far) is there are 2 A groups and 1 B group.
How would I calculate this?
At the moment, I would try something like this:
length(df[df$column2 == "B"]) ## outputs 2
but this is incorrect. If I combined column1 and column2, took only unique values 1A, 2B, 3A, I guess I could count how many times each label from column2 occurs?
(If it's easier, I'm happy to use data.table for this task.)
You can use rle for runs and table for tabulation:
table(rle(df$column2)$values)
# A B
# 2 1
See ?rle and ?table for details.
Or, if you want to take advantage of column1 (which is derived from column2):
table(unique(df)$column2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With