Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get count of group-level observations with multiple individual observations from dataframe in R

Tags:

dataframe

r

How do I get a dataframe like this:

soccer_player country position
"sam"         USA     left defender
"jon"         USA     right defender
"sam"         USA     left midfielder
"jon"         USA     offender
"bob"         England goalie
"julie"       England central midfielder
"jane"        England goalie

To look like this (country with the counts of unique players per country):

country player_count
USA     2
England 3

The obvious complication is that there are multiple observations per player, so I cannot simply do table(df$country) to get the number of observations per country.

I have been playing with the table() and merge() functions but have not had any luck.

like image 651
goldisfine Avatar asked Oct 13 '14 17:10

goldisfine


2 Answers

Here's one way:

as.data.frame(table(unique(d[-3])$country))
#      Var1 Freq
# 1 England    3
# 2     USA    2

Drop the third column, remove any duplicate Country-Name pairs, then count the occurrences of each country.

like image 179
Matthew Plourde Avatar answered Oct 05 '22 04:10

Matthew Plourde


The new features of dplyr v 3.0 provide a compact solution:

Data:

dd <- read.csv(text='
soccer_player,country,position
"sam",USA,left defender
"jon",USA,right defender
"sam",USA,left midfielder
"jon",USA,offender
"bob",England,goalie
"julie",England,central midfielder
"jane",England,goalie')

Code:

library(dplyr)

dd %>% distinct(soccer_player,country) %>% 
       count(country)
like image 44
Ben Bolker Avatar answered Oct 05 '22 03:10

Ben Bolker