Sample data:
df <- data.frame("ID" = 1:6,
"Group1" = c("A", NA, "C", NA, "E", "C"),
"Group2" = c("E", "C", "C", NA, "E", "E"),
"Group3" = c("A", "A", NA, NA, "C", NA),
"Group4" = c(NA, "C", NA, "D", "C", NA),
"Group5" = c("A", "D", NA, NA, NA, NA))
In each row, I want to count the number of each value and store the most frequent value in a new variable, New.Group
. In case of ties, the first value in the row should be selected. The logic applied to the example:
Row 1 of New.Group
takes value A
because it is most frequent value in the row, ignoring NA
s.
Row 2 takes value C
because it is also the most frequent value.
Row 3 the same as Row 2.
Row 4 takes value D
because it's the only value in the row.
In Row 5 both E
and C
has count 2, but E
is selected because it is encountered before C
in the row.
Row 6, similar to row 5, both C
and E
has count 1, but C
is selected because it is encountered before E
in the row.
The desired output:
ID Group1 Group2 Group3 Group4 Group5 New.Group
1 1 A E A <NA> A A
2 2 <NA> C A C D C
3 3 C C <NA> <NA> <NA> C
4 4 <NA> <NA> <NA> D <NA> D
5 5 E E C C <NA> E
6 6 C E <NA> <NA> <NA> C
How do you find the most repeated value in a column SQL? select cnt1. column_name from (select COUNT(*) as total, column_name from table_name group by column_name) cnt1, (select MAX(total) as maxtotal from (select COUNT(*) as total, column_name from table_name group by column_name)) cnt2 where cnt1.
Mode is the highest occurring figure in a series. It is the value in a series of observation that repeats maximum number of times and which represents the whole series as most of the values in the series revolves around this value. Therefore, mode is the value that occurs the most frequent times in a series.
To find the most frequent factor value in an R data frame column, we can use names function with which. max function after creating the table for the particular column. This might be required while doing factorial analysis and we want to know which factor occurs the most.
How do I find the most common name in SQL? SELECT <column_name>, COUNT(<column_name>) AS `value_occurrence` FROM <my_table> GROUP BY <column_name> ORDER BY `value_occurrence` DESC LIMIT 1; Replace <column_name> and <my_table> . Increase 1 if you want to see the N most common values of the column.
I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.
Thanks to Henrik for suggesting the improvement.
df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})
df
#> ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1 1 A E A <NA> A A
#> 2 2 <NA> C A C D C
#> 3 3 C C <NA> <NA> <NA> C
#> 4 4 <NA> <NA> <NA> D <NA> D
#> 5 5 E E C C <NA> E
#> 6 6 C E <NA> <NA> <NA> C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With