Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Majority vote in R

Tags:

r

I need to calculate the majority vote for an item in R and I don't have a clue how to approach this.

I have a data frame with items and assigned categories. What I need is the category that was assigned the most often. How do I do this?

Data frame:

item   category
1      2
1      3
1      2
1      2
2      2
2      3
2      1
2      1

Result should be:

item   majority_vote
1      2
2      1
like image 336
nantoki Avatar asked Jun 19 '13 21:06

nantoki


People also ask

What is majority vote?

In parliamentary procedure, the term "majority" simply means "more than half." As it relates to a vote, a majority vote is more than half of the votes cast. Abstentions or blanks are excluded in calculating a majority vote.

What is majority vote in decision making?

Majority rule is a principle that means the decision-making power belongs to the group that has the most members. In politics, majority rule requires the deciding vote to have majority, that is, more than half the votes.

How do you determine a 2/3rd majority vote?

A two-thirds vote, when unqualified, means two-thirds or more of the votes cast. This voting basis is equivalent to the number of votes in favour being at least twice the number of votes against. Abstentions and absences are excluded in calculating a two-thirds vote.

How much is a simple majority vote?

In the Senate, the bill is assigned to another committee and, if released, debated and voted on. Again, a simple majority (51 of 100) passes the bill.


3 Answers

You could use two things here. First, this is how you get the most frequent item in a vector:

> v = c(1,1,1,2,2)
> names(which.max(table(v)))
[1] "1"

This is a character value, but we can easily to an as.numeric on it if necessary.

Once we know how to do that, we can use the grouping functionality of the data.table package to perform a per-item evaluation of what its most frequent category is. Here is the code for your example above:

> dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1))
> dt
   item category
1:    1        2
2:    1        3
3:    1        2
4:    1        2
5:    2        2
6:    2        3
7:    2        1
8:    2        1
> dt[,as.numeric(names(which.max(table(category)))),by=item]
   item V1
1:    1  2
2:    2  1

The new V1 column contains the numeric version of the most frequent category for each item. If you want to give it a proper name, the syntax is a little uglier:

> dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item]
   item mostFreqCat
1:    1           2
2:    2           1
like image 154
asieira Avatar answered Sep 23 '22 04:09

asieira


One liner (using plyr):

ddply(dt, .(item), function(x) which.max(tabulate(x$category)))
like image 29
topchef Avatar answered Sep 24 '22 04:09

topchef


 tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec), 
                                                 decreasing=TRUE)[1] )
 data.frame(item=rownames(tdat), plurality_vote=tdat)

  item plurality_vote
1    1              3
2    2              2

A more complex function would be needed to distinguish a plurality (possibly with ties) from a true majority.

like image 24
IRTFM Avatar answered Sep 26 '22 04:09

IRTFM