Find the most frequent value by row

Tags:

My problem is as follows:

I have a data set containing several factor variables, which have the same categories. I need to find the category, which occurs most frequently for each row. In case of ties an arbitrary value can be chosen, although it would be great if I can have more control over it.

My data set contains over a hundred factors. However, the structure is something like that:

df = data.frame(id = 1:3
                var1 = c("red","yellow","green")
                var2 = c("red","yellow","green")
                var3 = c("yellow","orange","green")
                var4 = c("orange","green","yellow"))

df
#   id   var1   var2   var3   var4
# 1  1    red    red yellow orange
# 2  2 yellow yellow orange  green
# 3  3  green  green  green yellow

The solution should be a variable within the data frame, for example var5, which contains the most frequent category for each row. It can be a factor or a numeric vector (in case the data need to be converted first to numeric vectors)

In this case, I would like to have this solution:

df$var5
# [1] "red"    "yellow" "green"

Any advice will be much appreciated! Thanks in advance!

877

asked Nov 14 '13 16:11

ZMacarozzi

1 Answers

Something like :

apply(df,1,function(x) names(which.max(table(x))))
[1] "red"    "yellow" "green"

In case there is a tie, which.max takes the first max value. From the which.max help page :

Determines the location, i.e., index of the (first) minimum or maximum of a numeric vector.

Ex :

var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))

> df
  id   var1   var2   var3   var4
1  1    red    red yellow yellow
2  2 yellow yellow orange  green
3  3  green  green  green yellow

apply(df,1,function(x) names(which.max(table(x))))
[1] "red"    "yellow" "green"

answered Oct 05 '22 23:10

Chargaff

Related questions
                            
                                Get a histogram plot of factor frequencies (summary)
                            
                                Send a text message from R
                            
                                Using R cut function on dates
                            
                                asymmetric color distribution in scale_gradient2?
                            
                                Setting column name in "group by" operation with data.table
                            
                                Update subset of data.table based on join
                            
                                Join R data.tables where key values are not exactly equal--combine rows with closest times
                            
                                How can I put a transformed scale on the right side of a ggplot2?
                            
                                Use stat_summary in ggplot2 to calculate the mean and sd, then connect mean points of error bars
                            
                                I cannot connect postgresql schema.table with dplyr package
                            
                                Regression tables in Markdown format (for flexible use in R Markdown v2)
                            
                                specifying "skip NA" when calculating mean of the column in a data frame created by Pandas
                            
                                r Remove parts of column name after certain characters
                            
                                How do I handle multiple kinds of missingness in R?
                            
                                Convert R list to dataframe with missing/NULL elements
                            
                                how to add layers in ggplot using a for-loop
                            
                                ggplot2: Logistic Regression - plot probabilities and regression line
                            
                                I get error "Error in nnet.default(x, y, w, ...) : too many (77031) weights" while training neural networks
                            
                                Plot Size - Using ggplot2 in IPython Notebook (via rmagic)
                            
                                How to adjust title position in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find the most frequent value by row

Tags:

r

count

mode

factors

ZMacarozzi

People also ask

1 Answers

Chargaff

Recent Activity

Donate For Us