I need to calculate the majority vote for an item in R and I don't have a clue how to approach this.
I have a data frame with items and assigned categories. What I need is the category that was assigned the most often. How do I do this?
Data frame:
item category
1 2
1 3
1 2
1 2
2 2
2 3
2 1
2 1
Result should be:
item majority_vote
1 2
2 1
In parliamentary procedure, the term "majority" simply means "more than half." As it relates to a vote, a majority vote is more than half of the votes cast. Abstentions or blanks are excluded in calculating a majority vote.
Majority rule is a principle that means the decision-making power belongs to the group that has the most members. In politics, majority rule requires the deciding vote to have majority, that is, more than half the votes.
A two-thirds vote, when unqualified, means two-thirds or more of the votes cast. This voting basis is equivalent to the number of votes in favour being at least twice the number of votes against. Abstentions and absences are excluded in calculating a two-thirds vote.
In the Senate, the bill is assigned to another committee and, if released, debated and voted on. Again, a simple majority (51 of 100) passes the bill.
You could use two things here. First, this is how you get the most frequent item in a vector:
> v = c(1,1,1,2,2)
> names(which.max(table(v)))
[1] "1"
This is a character value, but we can easily to an as.numeric on it if necessary.
Once we know how to do that, we can use the grouping functionality of the data.table package to perform a per-item evaluation of what its most frequent category is. Here is the code for your example above:
> dt = data.table(item=c(1,1,1,1,2,2,2,2), category=c(2,3,2,2,2,3,1,1))
> dt
item category
1: 1 2
2: 1 3
3: 1 2
4: 1 2
5: 2 2
6: 2 3
7: 2 1
8: 2 1
> dt[,as.numeric(names(which.max(table(category)))),by=item]
item V1
1: 1 2
2: 2 1
The new V1 column contains the numeric version of the most frequent category for each item. If you want to give it a proper name, the syntax is a little uglier:
> dt[,list(mostFreqCat=as.numeric(names(which.max(table(category))))),by=item]
item mostFreqCat
1: 1 2
2: 2 1
One liner (using plyr
):
ddply(dt, .(item), function(x) which.max(tabulate(x$category)))
tdat <- tapply(dat$category, dat$item, function(vec) sort(table(vec),
decreasing=TRUE)[1] )
data.frame(item=rownames(tdat), plurality_vote=tdat)
item plurality_vote
1 1 3
2 2 2
A more complex function would be needed to distinguish a plurality (possibly with ties) from a true majority.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With