Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr and stringr to replace all values starts with

Tags:

r

dplyr

stringr

my df

> df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100))
>   df
          food      sold
1 fruit banana  99.47171
2  fruit apple  99.40878
3  fruit grape  99.28727
4        bread  99.15934
5         meat 100.53438

Now I want to replace all values in food that starts with "fruit" and then group by food and summarise sold with sum sold.

> df %>%
+     mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>% 
+     group_by(food) %>% 
+     summarise(sold = sum(sold))
Source: local data frame [3 x 2]

    food      sold
  (fctr)     (dbl)
1  bread  99.15934
2   meat 100.53438
3     NA 298.16776

Why is this command not working? It gives me NA instead of fruit?

like image 924
Tomas Ericsson Avatar asked May 04 '17 09:05

Tomas Ericsson


2 Answers

It is working for me, I think your data is in factors:

Using stringsAsFactors=FALSE while making the data as below or you can run options(stringsAsFactors=FALSE) in the R environment to avoid the same:

df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100),stringsAsFactors = FALSE)

df %>%
mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>% 
group_by(food) %>% 
summarise(sold = sum(sold))

Output:

 # A tibble: 3 × 2
       food      sold
      <chr>     <dbl>
    1 bread  99.67661
    2 fruit 300.28520
    3  meat  99.88566
like image 53
PKumar Avatar answered Sep 18 '22 07:09

PKumar


We can do this using base R without converting to character class by assigning the levels with 'fruit' to 'fruit' and use aggregate to get the sum

levels(df$food)[grepl("fruit", levels(df$food))] <- "fruit"
aggregate(sold~food, df, sum)
#   food      sold
#1 bread  99.41637
#2 fruit 300.41033
#3  meat 100.84746

data

set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", 
                 "bread", "meat"), sold = rnorm(5, 100))
like image 44
akrun Avatar answered Sep 18 '22 07:09

akrun