Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique values based on two columns with repeated rows, R data frame

I have an R data frame with the following format:

column1    column2
NA         NA
1          A
1          A
1          A
NA         NA
NA         NA
2          B
2          B
NA         NA
NA         NA
3          A
3          A
3          A

df = structure(list(column1 = c(NA, 1L, 1L, 1L, NA, NA, 2L, 2L, NA, 
NA, 3L, 3L, 3L), column2 = c(NA, "A", "A", "A", NA, NA, "B", 
"B", NA, NA, "A", "A", "A")), .Names = c("column1", "column2"
), row.names = c(NA, -13L), class = "data.frame")

If the row in one column has an NA, the other column has an NA. The numerical value in column1 describes a unique group, e.g. rows 2-4 have the group 1. The column column2 describes the identity of this grouping. In this data frame, the identity is either A, B, C, or D.

My goal is to tally the number of identities by group within the entire data frame: how many A groups there are, how many B groups, etc.

The correct output for this file (so far) is there are 2 A groups and 1 B group.

How would I calculate this?

At the moment, I would try something like this:

length(df[df$column2 == "B"]) ## outputs 2 

but this is incorrect. If I combined column1 and column2, took only unique values 1A, 2B, 3A, I guess I could count how many times each label from column2 occurs?

(If it's easier, I'm happy to use data.table for this task.)

like image 984
ShanZhengYang Avatar asked Jan 30 '26 02:01

ShanZhengYang


1 Answers

You can use rle for runs and table for tabulation:

table(rle(df$column2)$values)

# A B 
# 2 1 

See ?rle and ?table for details.

Or, if you want to take advantage of column1 (which is derived from column2):

table(unique(df)$column2)
like image 115
Frank Avatar answered Jan 31 '26 20:01

Frank