Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can count the value of a coulmn of different group?

Tags:

dataframe

r

Suppose there are 5 different columns in my data set.

Group   person   driving-license   #-cars   #trips
  1       1             (1)yes       2        1
  1       1             (1)yes       2        2
  1       1             (1)yes       2        3
  1       2             (1)yes       2        1
  1       2             (1)yes       2        2    
  2       1             (0)No        1        1
  2       1             (0)No        1        2
  2       2             (1)yes       1        1
  2       2             (1)yes       1        2
  2       2             (1)yes       1        3

in this sample there are 2 groups and in the first group there are 2 persons .The first one has 3 trips and second person 2 trips. So the 3 first rows belongs to the first person and next 2 rows to second person.

4-th columns is the number of cars in each groups. in first group there are 2 cars and in second group 1.

I want to create a vector and check that if in each groups the number of persons with driving-license is more than cars or not. (1 if it is and 0 otherwise).

What is the shortest and best way to do that?


1 Answers

Using dplyr we can compare number of persons in each Group to the number of cars.


Comparing number of persons with driving license to the number of cars in each group:

library(dplyr)

df1 %>% 
  filter(DrivingLicense=="(1)yes") %>% 
  mutate(MyVector=+(n_distinct(person)>cars)) %>% 
  group_by(Group) %>% 
  summarise(MyVector=max(MyVector)) %>% 
  left_join(df1, ., by="Group")
#>    Group person DrivingLicense cars trips MyVector
#> 1      1      1         (1)yes    2     1        0
#> 2      1      1         (1)yes    2     2        0
#> 3      1      1         (1)yes    2     3        0
#> 4      1      2         (1)yes    2     1        0
#> 5      1      2         (1)yes    2     2        0
#> 6      2      1          (0)No    1     1        1
#> 7      2      1          (0)No    1     2        1
#> 8      2      2         (1)yes    1     1        1
#> 9      2      2         (1)yes    1     2        1
#> 10     2      2         (1)yes    1     3        1

Based on all persons regardless of their driving status:

df1 %>% 
  group_by(Group) %>% 
  mutate(MyVector=+(n_distinct(person)>cars))
#> # A tibble: 10 x 6
#> # Groups:   Group [2]
#>    Group person DrivingLicense  cars trips MyVector
#>    <int>  <int> <fct>          <int> <int>    <int>
#>  1     1      1 (1)yes             2     1        0
#>  2     1      1 (1)yes             2     2        0
#>  3     1      1 (1)yes             2     3        0
#>  4     1      2 (1)yes             2     1        0
#>  5     1      2 (1)yes             2     2        0
#>  6     2      1 (0)No              1     1        1
#>  7     2      1 (0)No              1     2        1
#>  8     2      2 (1)yes             1     1        1
#>  9     2      2 (1)yes             1     2        1
#> 10     2      2 (1)yes             1     3        1

Data:

df1 <- read.table(text="Group   person   DrivingLicense   cars   trips
  1       1             (1)yes       2        1
  1       1             (1)yes       2        2
  1       1             (1)yes       2        3
  1       2             (1)yes       2        1
  1       2             (1)yes       2        2    
  2       1             (0)No        1        1
  2       1             (0)No        1        2
  2       2             (1)yes       1        1
  2       2             (1)yes       1        2
  2       2             (1)yes       1        3", header=T)

Created on 2019-06-29 by the reprex package (v0.3.0)

like image 200
M-- Avatar answered Jan 27 '26 12:01

M--