Assuming this is my dataframe:
df <- data.frame(grp = c("ab -10", "ab 0", "ab 8", "ab -1",
"ab 6", "ab 6", "ab -10", "ab 1",
"ab -10", "ab 0", "ab 8", "ab -1",
"ab 6", "ab 6", "ab -10", "ab 1",
"d", "e", "e", "e"),
freq = c(1,0,0,1,0,1,2,0,1,0,2,2,1,1,0,1,0,2,2,1))
df
grp freq
1 ab -10 1
2 ab 0 0
3 ab 8 0
4 ab -1 1
5 ab 6 0
6 ab 6 1
7 ab -10 2
8 ab 1 0
9 ab -10 1
10 ab 0 0
11 ab 8 2
12 ab -1 2
13 ab 6 1
14 ab 6 1
15 ab -10 0
16 ab 1 1
17 d 0
18 e 2
19 e 2
20 e 1
I want to have:
> finaldf
grp freq
1 ab < 0 7
2 ab 0-5 1
3 ab 5+ 5
4 d 0
5 e 5
This is what I tried:
df %>%
bind_rows(df %>%
filter(!grepl("ab", grp)),
df %>%
filter(grepl("ab", grp)) %>%
mutate(grp = parse_number(grp)) %>%
mutate(grp = cut(as.numeric(grp),
breaks = c(-999, 0, 6, 999),
labels = c("ab < 0", "ab 0-5", "ab 5+"),
right = F))) %>%
group_by(grp) %>%
summarise(N =n())
but it seems like bind_rows is duplicating dataframes.
grp freq
1 ab -10 1
2 ab 0 0
3 ab 8 0
4 ab -1 1
5 ab 6 0
6 ab 6 1
7 ab -10 2
8 ab 1 0
9 ab -10 1
10 ab 0 0
11 ab 8 2
12 ab -1 2
13 ab 6 1
14 ab 6 1
15 ab -10 0
16 ab 1 1
17 d 0
18 e 2
19 e 2
20 e 1
21 d 0
22 e 2
23 e 2
24 e 1
25 ab < 0 1
26 ab 0-5 0
27 ab 5+ 0
28 ab < 0 1
29 ab 5+ 0
30 ab 5+ 1
31 ab < 0 2
32 ab 0-5 0
33 ab < 0 1
34 ab 0-5 0
35 ab 5+ 2
36 ab < 0 2
37 ab 5+ 1
38 ab 5+ 1
39 ab < 0 0
40 ab 0-5 1
I can slice() half of the rows, but I m more interested in knowing what I am doing wrong?
Any other neat and pretty approach is also highly appreciated!
Here is one method where will split the column into 'two' with separate, recode the numeric values, unite and then do a group by sum
library(dplyr)
library(tidyr)
df %>%
separate(grp, into = c('grp1', 'value'), sep = "(?<=ab)\\s+",
fill = "right", convert = TRUE) %>%
mutate(value = case_when(value <0 ~ '< 0',
between(value, 0, 5) ~ '0-5', value > 5 ~ '5+')) %>%
unite(grp, grp1, value, na.rm = TRUE, sep=" ") %>%
group_by(grp) %>%
summarise(freq = sum(freq), .groups = 'drop')
-output
# A tibble: 5 × 2
grp freq
<chr> <dbl>
1 ab < 0 7
2 ab 0-5 1
3 ab 5+ 5
4 d 0
5 e 5
In the OP's code, it is the beginning df %>% needs to removed as we are passing both filtered datasets in bind_rows. When we add the df %>%, it will be passed as the first argument to bind_rows, thus duplicating the rows
library(readr)
bind_rows(df %>%
filter(!grepl("ab", grp)),
df %>%
filter(grepl("ab", grp)) %>%
mutate(grp = parse_number(grp)) %>%
mutate(grp = cut(as.numeric(grp),
breaks = c(-999, 0, 6, 999),
labels = c("ab < 0", "ab 0-5", "ab 5+"),
right = FALSE))) %>%
group_by(grp) %>%
summarise(N =sum(freq))
# A tibble: 5 × 2
grp N
<chr> <dbl>
1 ab < 0 7
2 ab 0-5 1
3 ab 5+ 5
4 d 0
5 e 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With