Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complete column with group_by and complete

Tags:

r

dplyr

tidyr

I've got a little problem using dplyr group_by function. After doing this :

datasetALL %>% group_by(YEAR,Region) %>% summarise(count_number = n()) 

here is the result :

YEAR Region count_number
<int>  <int>        <int>
1   1946      1            2
2   1946      2            3
3   1946      3            1
4   1946      5            1
5   1947      3            1
6   1947      4            1

I would like something like :

YEAR Region count_number
<int>  <int>        <int>
1   1946      1            2
2   1946      2            3
3   1946      3            1
4   1946      5            1
5   1946      4            0 #order is no important
6   1947      1            0
7   1947      2            0
8   1947      3            1
9   1947      4            1
10  1947      5            0

I try to use complete() from tidyr package, but it's not succeeding...

like image 582
Ben Avatar asked Apr 19 '17 16:04

Ben


People also ask

What is a grouping column in SQL?

The columns that appear in the GROUP BY clause are called grouping columns. If a grouping column contains NULL values, all NULL values are summarized into a single group because the GROUP BY clause considers NULL values are equal. SQL GROUP BY examples

Why do we put the column school_year in the group by clause?

Our query would need to show the year, and the COUNT of each year. We put the column school_year into the GROUP BY clause because we want to show the COUNT result for each instance of the school_year. Also, when we use aggregate functions, we need to add any non-aggregate columns into the GROUP BY.

How to group by subtotals and grand total in SQL?

The SQL GROUP BY clause has more to it than just specifying columns to group by. There are several different grouping options you can use, and one of them is ROLLUP. The ROLLUP SQL grouping type allows you to group by subtotals and a grand total.

How does the group by clause work without an aggregate function?

If you use the GROUP BY clause without an aggregate function, the GROUP BY clause behaves like the DISTINCT operator. The following gets the phone numbers of employees and also group rows by the phone numbers. Notice that the phone numbers are sorted.


Video Answer


2 Answers

Using complete from the tidyr package should work. You can find documentation about it here.

What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR and Region within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.

datasetALL %>% 
    group_by(YEAR,Region) %>% 
    summarise(count_number = n()) %>%
    ungroup() %>%
    complete(Year, Region, fill = list(count_number = 1))
like image 127
Pieter Avatar answered Sep 30 '22 08:09

Pieter


It has been already mentioned, but you can solve this problem in its entirety by using tidyr and the parameter nesting in it:

complete(df, YEAR, nesting(Region), fill = list(count_number = 0))

    YEAR Region count_number
   <int>  <int>        <dbl>
 1  1946      1            2
 2  1946      2            3
 3  1946      3            1
 4  1946      4            0
 5  1946      5            1
 6  1947      1            0
 7  1947      2            0
 8  1947      3            1
 9  1947      4            1
10  1947      5            0
like image 20
tmfmnk Avatar answered Sep 30 '22 10:09

tmfmnk