Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the most frequent value per row and account for ties [duplicate]

Tags:

r

count

recode

Sample data:

df <- data.frame("ID" = 1:6, 
                 "Group1" = c("A", NA, "C", NA, "E", "C"), 
                 "Group2" = c("E", "C", "C", NA, "E", "E"),
                 "Group3" = c("A", "A", NA, NA, "C", NA),
                 "Group4" = c(NA, "C", NA, "D", "C", NA),
                 "Group5" = c("A", "D", NA, NA, NA, NA))

In each row, I want to count the number of each value and store the most frequent value in a new variable, New.Group. In case of ties, the first value in the row should be selected. The logic applied to the example:

Row 1 of New.Group takes value A because it is most frequent value in the row, ignoring NAs.

Row 2 takes value C because it is also the most frequent value.

Row 3 the same as Row 2.

Row 4 takes value D because it's the only value in the row.

In Row 5 both E and C has count 2, but E is selected because it is encountered before C in the row.

Row 6, similar to row 5, both C and E has count 1, but C is selected because it is encountered before E in the row.

The desired output:

  ID Group1 Group2 Group3 Group4 Group5 New.Group
1  1      A      E      A   <NA>      A         A
2  2   <NA>      C      A      C      D         C
3  3      C      C   <NA>   <NA>   <NA>         C
4  4   <NA>   <NA>   <NA>      D   <NA>         D
5  5      E      E      C      C   <NA>         E
6  6      C      E   <NA>   <NA>   <NA>         C
like image 691
Laura Avatar asked Jul 21 '20 17:07

Laura


People also ask

How do you select the most frequent value in a column per each ID group?

How do you find the most repeated value in a column SQL? select cnt1. column_name from (select COUNT(*) as total, column_name from table_name group by column_name) cnt1, (select MAX(total) as maxtotal from (select COUNT(*) as total, column_name from table_name group by column_name)) cnt2 where cnt1.

What is the most repeated value in a data set?

Mode is the highest occurring figure in a series. It is the value in a series of observation that repeats maximum number of times and which represents the whole series as most of the values in the series revolves around this value. Therefore, mode is the value that occurs the most frequent times in a series.

How do I find the most frequent value in R?

To find the most frequent factor value in an R data frame column, we can use names function with which. max function after creating the table for the particular column. This might be required while doing factorial analysis and we want to know which factor occurs the most.

How do I find the most common name in SQL?

How do I find the most common name in SQL? SELECT <column_name>, COUNT(<column_name>) AS `value_occurrence` FROM <my_table> GROUP BY <column_name> ORDER BY `value_occurrence` DESC LIMIT 1; Replace <column_name> and <my_table> . Increase 1 if you want to see the N most common values of the column.


1 Answers

I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.

Thanks to Henrik for suggesting the improvement.

df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})

df
#>   ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1  1      A      E      A   <NA>      A         A
#> 2  2   <NA>      C      A      C      D         C
#> 3  3      C      C   <NA>   <NA>   <NA>         C
#> 4  4   <NA>   <NA>   <NA>      D   <NA>         D
#> 5  5      E      E      C      C   <NA>         E
#> 6  6      C      E   <NA>   <NA>   <NA>         C
like image 125
Allan Cameron Avatar answered Oct 28 '22 10:10

Allan Cameron