Sample data: <pre class="prettyprint"><code>df <- data.frame("ID" = 1:6, "Group1" = c("A", NA, "C", NA, "E", "C"), "Group2" = c("E", "C", "C", NA, "E", "E"), "Group3" = c("A", "A", NA, NA, "C", NA), "Group4" = c(NA, "C", NA, "D", "C", NA), "Group5" = c("A", "D", NA, NA, NA, NA)) </code></pre> In each row, I want to count the number of each value and store the most frequent value in a new variable, <code>New.Group</code>. In case of ties, the first value in the row should be selected. The logic applied to the example: Row 1 of <code>New.Group</code> takes value <code>A</code> because it is most frequent value in the row, ignoring <code>NA</code>s. Row 2 takes value <code>C</code> because it is also the most frequent value. Row 3 the same as Row 2. Row 4 takes value <code>D</code> because it's the only value in the row. In Row 5 both <code>E</code> and <code>C</code> has count 2, but <code>E</code> is selected because it is encountered before <code>C</code> in the row. Row 6, similar to row 5, both <code>C</code> and <code>E</code> has count 1, but <code>C</code> is selected because it is encountered before <code>E</code> in the row. The desired output: <pre class="prettyprint"><code> ID Group1 Group2 Group3 Group4 Group5 New.Group 1 1 A E A <NA> A A 2 2 <NA> C A C D C 3 3 C C <NA> <NA> <NA> C 4 4 <NA> <NA> <NA> D <NA> D 5 5 E E C C <NA> E 6 6 C E <NA> <NA> <NA> C </code></pre>

I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table. Thanks to Henrik for suggesting the improvement. <pre class="prettyprint lang-r prettyprint-override"><code>df$New.Group <- apply(df[-1], 1, function(x) { names(which.max(table(factor(x, unique(x))))) }) df #> ID Group1 Group2 Group3 Group4 Group5 New.Group #> 1 1 A E A <NA> A A #> 2 2 <NA> C A C D C #> 3 3 C C <NA> <NA> <NA> C #> 4 4 <NA> <NA> <NA> D <NA> D #> 5 5 E E C C <NA> E #> 6 6 C E <NA> <NA> <NA> C </code></pre>

Get the most frequent value per row and account for ties [duplicate]

Tags:

r

count

recode

Sample data:

df <- data.frame("ID" = 1:6, 
                 "Group1" = c("A", NA, "C", NA, "E", "C"), 
                 "Group2" = c("E", "C", "C", NA, "E", "E"),
                 "Group3" = c("A", "A", NA, NA, "C", NA),
                 "Group4" = c(NA, "C", NA, "D", "C", NA),
                 "Group5" = c("A", "D", NA, NA, NA, NA))

In each row, I want to count the number of each value and store the most frequent value in a new variable, New.Group. In case of ties, the first value in the row should be selected. The logic applied to the example:

Row 1 of New.Group takes value A because it is most frequent value in the row, ignoring NAs.

Row 2 takes value C because it is also the most frequent value.

Row 3 the same as Row 2.

Row 4 takes value D because it's the only value in the row.

In Row 5 both E and C has count 2, but E is selected because it is encountered before C in the row.

Row 6, similar to row 5, both C and E has count 1, but C is selected because it is encountered before E in the row.

The desired output:

  ID Group1 Group2 Group3 Group4 Group5 New.Group
1  1      A      E      A   <NA>      A         A
2  2   <NA>      C      A      C      D         C
3  3      C      C   <NA>   <NA>   <NA>         C
4  4   <NA>   <NA>   <NA>      D   <NA>         D
5  5      E      E      C      C   <NA>         E
6  6      C      E   <NA>   <NA>   <NA>         C

691

asked Jul 21 '20 17:07

Laura

1 Answers

I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.

Thanks to Henrik for suggesting the improvement.

df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})

df
#>   ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1  1      A      E      A   <NA>      A         A
#> 2  2   <NA>      C      A      C      D         C
#> 3  3      C      C   <NA>   <NA>   <NA>         C
#> 4  4   <NA>   <NA>   <NA>      D   <NA>         D
#> 5  5      E      E      C      C   <NA>         E
#> 6  6      C      E   <NA>   <NA>   <NA>         C

125

answered Oct 28 '22 10:10

Allan Cameron

Related questions
                            
                                Efficient way to fill column with numbers that identify observations with same value in column [duplicate]
                            
                                How to bind rows without losing those with character(0)?
                            
                                Apply color brewer to a single line in ggplot
                            
                                Override horizontal positioning with ggrepel
                            
                                grepl on two vectors element by element
                            
                                R remove duplicate rows keeping those with values
                            
                                How do I use facetting correctly in ggplot geom_tile, while keeping the aspect ratio intact?
                            
                                Chop off the first letter of every variable name [duplicate]
                            
                                Dplyr filter top and bottom rows by value simultaneously on grouped data
                            
                                Compare the words from a data frame and calculate a matrix with the length of the biggest word for each pair
                            
                                How to make a frequency table by class [duplicate]
                            
                                One hot encode list of vectors
                            
                                Cumulative sum in R by group and start over when sum of values in group larger than maximum value
                            
                                Fail to render an animation
                            
                                Apply a summarise condition to a range of columns when using dplyr group_by?
                            
                                Compilation failed when installing Rcpp
                            
                                Slow dplyr query in R
                            
                                Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘select’ for signature ‘"spec_tbl_df"’
                            
                                How can I put a scalebar and a north arrow on the map (ggplot)?
                            
                                Create two column with multiple separators

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With