I'm trying to count totals for goals, primary assists, and secondary assists for each player. My problem is that I can't get my head around the logic to do that, as the data I want to summarize by (player name) is listed across three variables (goal, primary assist and secondary assist) Here's my reproducible data (it's from <code>dput()</code>, so apologies for the mess). <pre class="prettyprint"><code>mydata <- structure(list(primary_assist = c("Dmitry Gilyazitdinov", "Evgeny Orlov", "Anton Burdasov", "Sergei Kalinin", "Stanislav Solovyov", "Vasily Streltsov", NA, "Bogdan Potekhin", "Bogdan Potekhin", "Vasily Streltsov", "Vasily Streltsov", "Viktor Postnikov", "Danil Kaskov", NA, NA, "Artemy Panarin"), secondary_assist = c("Andrei Badrutdinov", NA, NA, NA, "Danil Gubarev", "Nikita Manukhov", NA, "Evgeny Grigorenko", "Daniil Apalkov", "Ivan Boiko", NA, "Viktor Antipin", "Vitaly Sychov", NA, NA, "Stanislav Levin"), goal = c("Vitaly Kropachyov", "Dmitry Kozlov", "Stanislav Solovyov", "Kirill Polyansky", "Anton Burdasov", "Ilya Solodov", "Alexander Antropov", "Daniil Apalkov", "Evgeny Grigorenko", "Alexander Antropov", "Alexander Antropov", "Evgeny Grigorenko", "Denis Belonogov", "Vitaly Sychov", "Alexander Streltsov", "Pyotr Kopyttsov" ), team = c("Belye Medvedi", "Omskie Yastreby", "Belye Medvedi", "Omskie Yastreby", "Belye Medvedi", "Avto", "Avto", "Stalnye Lisy", "Stalnye Lisy", "Avto", "Avto", "Stalnye Lisy", "Avto", "Avto", "Avto", "Russkie Vityazi"), game_strength = c("PP", "EV", "EV", "EV", "EV", "PP", "SO", "EV", "PP", "PP", "EV", "PP", "PP", "EV", "PP", "EV"), season = c("2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10" )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -16L), .Names = c("primary_assist", "secondary_assist", "goal", "team", "game_strength", "season")) </code></pre> <pre class="prettyprint lang-r prettyprint-override"><code>mydata #> # A tibble: 16 x 6 #> primary_assist secondary_assist goal team game_strength season #> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 Dmitry Gilyazitdinov Andrei Badrutdin~ Vita~ Bely~ PP 2009-~ #> 2 Evgeny Orlov <NA> Dmit~ Omsk~ EV 2009-~ #> 3 Anton Burdasov <NA> Stan~ Bely~ EV 2009-~ #> 4 Sergei Kalinin <NA> Kiri~ Omsk~ EV 2009-~ #> 5 Stanislav Solovyov Danil Gubarev Anto~ Bely~ EV 2009-~ #> 6 Vasily Streltsov Nikita Manukhov Ilya~ Avto PP 2009-~ #> 7 <NA> <NA> Alex~ Avto SO 2009-~ #> 8 Bogdan Potekhin Evgeny Grigorenko Dani~ Stal~ EV 2009-~ #> 9 Bogdan Potekhin Daniil Apalkov Evge~ Stal~ PP 2009-~ #> 10 Vasily Streltsov Ivan Boiko Alex~ Avto PP 2009-~ #> 11 Vasily Streltsov <NA> Alex~ Avto EV 2009-~ #> 12 Viktor Postnikov Viktor Antipin Evge~ Stal~ PP 2009-~ #> 13 Danil Kaskov Vitaly Sychov Deni~ Avto PP 2009-~ #> 14 <NA> <NA> Vita~ Avto EV 2009-~ #> 15 <NA> <NA> Alex~ Avto PP 2009-~ #> 16 Artemy Panarin Stanislav Levin Pyot~ Russ~ EV 2009-~ </code></pre> So, I want to count the number of goals, primary assists, and secondary assists for each player, and then have 1 row for each player. Let's say the name "Artemy Panarin" is listed 1 time in goals, 0 times in primary assists, and 2 times in secondary assists, my output would look like this: <pre class="prettyprint lang-r prettyprint-override"><code>tibble::tibble(name = c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"), team = c("Russkie Vityazi", "Russkie Vityazi", "Avto"), goals = c(1, 1, 0), primary_assists = c(0, 0, 1), secondary_assists = c(2, 0, 0)) #> # A tibble: 3 x 5 #> name team goals primary_assists secondary_assists #> <chr> <chr> <dbl> <dbl> <dbl> #> 1 Artemy Panarin Russkie Vityazi 1.00 0 2.00 #> 2 Stanislav Levin Russkie Vityazi 1.00 0 0 #> 3 Danil Kaskov Avto 0 1.00 0 </code></pre> Does that make any sense? Any ideas? Tidyverse solutions preferred. Thanks!

We can <code>gather</code> to 'long' format, grouped by 'name', 'team', and the 'key' column (from <code>gather</code>), <code>summarise</code> to get the count and <code>spread</code> back to 'wide' format <pre class="prettyprint"><code>library(tidyverse) gather(mydata, key, name, primary_assist:goal) %>% group_by(name, team, key) %>% summarise(n = n()) %>% spread(key, n, fill = 0) # A tibble: 30 x 5 # Groups: name, team [30] # name team goal primary_assist secondary_assist # <chr> <chr> <dbl> <dbl> <dbl> # 1 Alexander Antropov Avto 3 0 0 # 2 Alexander Streltsov Avto 1 0 0 # 3 Andrei Badrutdinov Belye Medvedi 0 0 1 # 4 Anton Burdasov Belye Medvedi 1 1 0 # 5 Artemy Panarin Russkie Vityazi 0 1 0 # 6 Bogdan Potekhin Stalnye Lisy 0 2 0 # 7 Daniil Apalkov Stalnye Lisy 1 0 1 # 8 Danil Gubarev Belye Medvedi 0 0 1 # 9 Danil Kaskov Avto 0 1 0 #10 Denis Belonogov Avto 1 0 0 # ... with 20 more rows </code></pre>

Summarizing data by name separated across multiple variables

Tags:

r

dplyr

data-cleaning

I'm trying to count totals for goals, primary assists, and secondary assists for each player. My problem is that I can't get my head around the logic to do that, as the data I want to summarize by (player name) is listed across three variables (goal, primary assist and secondary assist)

Here's my reproducible data (it's from dput(), so apologies for the mess).

mydata <- structure(list(primary_assist = c("Dmitry Gilyazitdinov", "Evgeny Orlov", 
"Anton Burdasov", "Sergei Kalinin", "Stanislav Solovyov", "Vasily Streltsov", 
NA, "Bogdan Potekhin", "Bogdan Potekhin", "Vasily Streltsov", 
"Vasily Streltsov", "Viktor Postnikov", "Danil Kaskov", NA, NA, 
"Artemy Panarin"), secondary_assist = c("Andrei Badrutdinov", 
NA, NA, NA, "Danil Gubarev", "Nikita Manukhov", NA, "Evgeny Grigorenko", 
"Daniil Apalkov", "Ivan Boiko", NA, "Viktor Antipin", "Vitaly Sychov", 
NA, NA, "Stanislav Levin"), goal = c("Vitaly Kropachyov", "Dmitry Kozlov", 
"Stanislav Solovyov", "Kirill Polyansky", "Anton Burdasov", "Ilya Solodov", 
"Alexander Antropov", "Daniil Apalkov", "Evgeny Grigorenko", 
"Alexander Antropov", "Alexander Antropov", "Evgeny Grigorenko", 
"Denis Belonogov", "Vitaly Sychov", "Alexander Streltsov", "Pyotr Kopyttsov"
), team = c("Belye Medvedi", "Omskie Yastreby", "Belye Medvedi", 
"Omskie Yastreby", "Belye Medvedi", "Avto", "Avto", "Stalnye Lisy", 
"Stalnye Lisy", "Avto", "Avto", "Stalnye Lisy", "Avto", "Avto", 
"Avto", "Russkie Vityazi"), game_strength = c("PP", "EV", "EV", 
"EV", "EV", "PP", "SO", "EV", "PP", "PP", "EV", "PP", "PP", "EV", 
"PP", "EV"), season = c("2009-10", "2009-10", "2009-10", "2009-10", 
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10", 
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-16L), .Names = c("primary_assist", "secondary_assist", "goal", 
"team", "game_strength", "season"))

mydata
#> # A tibble: 16 x 6
#>    primary_assist       secondary_assist  goal  team  game_strength season
#>    <chr>                <chr>             <chr> <chr> <chr>         <chr> 
#>  1 Dmitry Gilyazitdinov Andrei Badrutdin~ Vita~ Bely~ PP            2009-~
#>  2 Evgeny Orlov         <NA>              Dmit~ Omsk~ EV            2009-~
#>  3 Anton Burdasov       <NA>              Stan~ Bely~ EV            2009-~
#>  4 Sergei Kalinin       <NA>              Kiri~ Omsk~ EV            2009-~
#>  5 Stanislav Solovyov   Danil Gubarev     Anto~ Bely~ EV            2009-~
#>  6 Vasily Streltsov     Nikita Manukhov   Ilya~ Avto  PP            2009-~
#>  7 <NA>                 <NA>              Alex~ Avto  SO            2009-~
#>  8 Bogdan Potekhin      Evgeny Grigorenko Dani~ Stal~ EV            2009-~
#>  9 Bogdan Potekhin      Daniil Apalkov    Evge~ Stal~ PP            2009-~
#> 10 Vasily Streltsov     Ivan Boiko        Alex~ Avto  PP            2009-~
#> 11 Vasily Streltsov     <NA>              Alex~ Avto  EV            2009-~
#> 12 Viktor Postnikov     Viktor Antipin    Evge~ Stal~ PP            2009-~
#> 13 Danil Kaskov         Vitaly Sychov     Deni~ Avto  PP            2009-~
#> 14 <NA>                 <NA>              Vita~ Avto  EV            2009-~
#> 15 <NA>                 <NA>              Alex~ Avto  PP            2009-~
#> 16 Artemy Panarin       Stanislav Levin   Pyot~ Russ~ EV            2009-~

So, I want to count the number of goals, primary assists, and secondary assists for each player, and then have 1 row for each player. Let's say the name "Artemy Panarin" is listed 1 time in goals, 0 times in primary assists, and 2 times in secondary assists, my output would look like this:

tibble::tibble(name = c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"), team = c("Russkie Vityazi", "Russkie Vityazi", "Avto"), goals = c(1, 1, 0), primary_assists = c(0, 0, 1), secondary_assists = c(2, 0, 0))
#> # A tibble: 3 x 5
#>   name            team            goals primary_assists secondary_assists
#>   <chr>           <chr>           <dbl>           <dbl>             <dbl>
#> 1 Artemy Panarin  Russkie Vityazi  1.00            0                 2.00
#> 2 Stanislav Levin Russkie Vityazi  1.00            0                 0   
#> 3 Danil Kaskov    Avto             0               1.00              0

Does that make any sense? Any ideas? Tidyverse solutions preferred. Thanks!

786

asked Jul 18 '18 15:07

Evan O.

2 Answers

We can gather to 'long' format, grouped by 'name', 'team', and the 'key' column (from gather), summarise to get the count and spread back to 'wide' format

library(tidyverse)
gather(mydata, key, name, primary_assist:goal) %>% 
     group_by(name, team, key) %>% 
     summarise(n = n()) %>% 
     spread(key, n, fill = 0)
# A tibble: 30 x 5
# Groups:   name, team [30]
#   name                team             goal primary_assist secondary_assist
#   <chr>               <chr>           <dbl>          <dbl>            <dbl>
# 1 Alexander Antropov  Avto                3              0                0
# 2 Alexander Streltsov Avto                1              0                0
# 3 Andrei Badrutdinov  Belye Medvedi       0              0                1
# 4 Anton Burdasov      Belye Medvedi       1              1                0
# 5 Artemy Panarin      Russkie Vityazi     0              1                0
# 6 Bogdan Potekhin     Stalnye Lisy        0              2                0
# 7 Daniil Apalkov      Stalnye Lisy        1              0                1
# 8 Danil Gubarev       Belye Medvedi       0              0                1
# 9 Danil Kaskov        Avto                0              1                0
#10 Denis Belonogov     Avto                1              0                0
# ... with 20 more rows

138

answered Sep 20 '22 15:09

akrun

One way to get the result is to reshape your data with gather()/spread() in addition to your summarize strategy.

library(tidyverse)

scoring_summary <- mydata %>%
    select(primary_assist:team) %>%
    gather("key", "player", -team) %>%
    group_by(player) %>%
    count(key) %>%
    spread(key, n)

# convert NAs to 0
scoring_summary[is.na(scoring_summary)] <- 0

scoring_summary

# A tibble: 28 x 4
# Groups:   player [28]
player               goal primary_assist secondary_assist
<chr>               <dbl>          <dbl>            <dbl>
1 Alexander Antropov      3              0                0
2 Alexander Streltsov     1              0                0
3 Andrei Badrutdinov      0              0                1
4 Anton Burdasov          1              1                0
5 Artemy Panarin          0              1                0
6 Bogdan Potekhin         0              2                0
7 Daniil Apalkov          1              0                1
8 Danil Gubarev           0              0                1
9 Danil Kaskov            0              1                0
10 Denis Belonogov        1              0                0

count() is doing the same thing as your original attempt with summarise(count(goals)

answered Sep 21 '22 15:09

Nate

Related questions
                            
                                R: Deleting rows based on a value in a column from a large data set in R [duplicate]
                            
                                Using group_by with mutate_if by column name
                            
                                Split data frame by two factors
                            
                                get a line break / new line in excel file with r xlsx
                            
                                replace values throughout a tibble
                            
                                Subsetting geojson data with R
                            
                                Correlation Matrix - tidyr gather v. reshape2 melt
                            
                                dplyr number of rows across groups after filtering
                            
                                How to put plots without any space using plot_grid?
                            
                                convert all factor columns to character in a data.frame without affecting non-factor columns
                            
                                How to plot dataframe in R as a heatmap/grid?
                            
                                How can I add a message box in R Shiny?
                            
                                Using stat_function to draw partially shaded normal curve in ggplot2
                            
                                R data.table compute new column, but insert at beginning
                            
                                In R, using melt(), how can I hide warning messages?
                            
                                Does `tfread` exist?
                            
                                Rank most recent scores of students within a given date - 30 days window
                            
                                Is this what rnorm(x) does if x is a vector, and how could I have found out faster?
                            
                                Overlaying line on contour plot using Plotly
                            
                                Find all records which have multiple values in a column in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With