Majority vote in R

Tags:

3 Answers

You could use two things here. First, this is how you get the most frequent item in a vector:

> v = c(1,1,1,2,2)
> names(which.max(table(v)))
[1] "1"

This is a character value, but we can easily to an as.numeric on it if necessary.

Once we know how to do that, we can use the grouping functionality of the data.table package to perform a per-item evaluation of what its most frequent category is. Here is the code for your example above:

> dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1))
> dt
   item category
1:    1        2
2:    1        3
3:    1        2
4:    1        2
5:    2        2
6:    2        3
7:    2        1
8:    2        1
> dt[,as.numeric(names(which.max(table(category)))),by=item]
   item V1
1:    1  2
2:    2  1

The new V1 column contains the numeric version of the most frequent category for each item. If you want to give it a proper name, the syntax is a little uglier:

> dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item]
   item mostFreqCat
1:    1           2
2:    2           1

154

answered Sep 23 '22 04:09

asieira

One liner (using plyr):

ddply(dt, .(item), function(x) which.max(tabulate(x$category)))

answered Sep 24 '22 04:09

topchef

 tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec), 
                                                 decreasing=TRUE)[1] )
 data.frame(item=rownames(tdat), plurality_vote=tdat)

  item plurality_vote
1    1              3
2    2              2

A more complex function would be needed to distinguish a plurality (possibly with ties) from a true majority.

answered Sep 26 '22 04:09

IRTFM

Related questions
                            
                                R citation which year to cite?
                            
                                Beginner tips on using plyr to calculate year-over-year change across groups
                            
                                remove cases, all id's, for cases where NO changes have occured between time 1 and time 2
                            
                                R arrowed labelling of data points on a plot
                            
                                How to append a string to subset of variable names in R?
                            
                                How to group a data.frame by date?
                            
                                add a new column that identifies individuals
                            
                                Cut a POSIXct by specific time for daily means
                            
                                Changing "/" into "\" in R
                            
                                Remove thousand's separator [duplicate]
                            
                                Finding mean of standard normal distribution in a given interval
                            
                                Histogram with marginal boxplot in R
                            
                                Estimate the gradient of an undefined surface
                            
                                installing the package xtsExtra on a windows machine
                            
                                simple reshape in R using reshape2 causes error
                            
                                How to plot absolute values and differences including confidence intervals
                            
                                How to calculate percentage for each cell in a dataframe using ddply?
                            
                                Getting the parse tree for a predefined function in R
                            
                                programmatically adding new variables to a dataframe
                            
                                apply over nested functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With