I guess it might be a simple trick, but I do not know how to achieve it...
My dateset looks like:
Name, Score
A a, 20
A, 30
B b, 40
The output I expect is:
Name, Score
A, 50
B, 40
In one word, sum the scores that have names which start with the same word (before the space if there is one). I hope the example is self-explanatory. :)
PS: The faster the code runs, the better. The dataset is huge...
Another option would be separate
library(dplyr)
library(tidyr)
separate(df1, Name, into=c("Name", "extra")) %>%
group_by(Name) %>%
summarise(Score=sum(Score))
# Name Score
# (chr) (int)
#1 A 50
#2 B 40
Or extract
extract(df1, Name, into= "Name", "(\\S+).*") %>%
group_by(Name) %>%
summarise(Score = sum(Score))
You can try something like this:
library(dplyr)
library(stringr)
df$newName <- str_extract(df$Name, '[[:alnum:]]+')
df %>% group_by(newName) %>% summarise(Score = sum(Score))
Source: local data frame [2 x 2]
newName Score
(chr) (int)
1 A 50
2 B 40
Note, you will want to make sure 'Name' is read as a character vector and not as factors. Use stringsAsFactors = FALSE
in your read call, or use as.character
to convert it.
If you want the full first 'string', you can also use this regex pattern:
df$newName <- str_extract(df$Name, '([^\\s]+)')
I used substr
to extract the first letter and then group_by
. I believe that dplyr starts_with
is used to make a selection of whole columns based on their titles. This solution only works if the letter you want to select is always the first letter.
require(dplyr)
df<-data.frame(Name=c("A a,","A,","B b"),Score=c(20,30,40))
df$Name <- substr(df$Name,1,1)
df %>% group_by(Name) %>% summarise(sum_score=sum(Score))
Source: local data frame [2 x 2]
Name sum_score
(chr) (dbl)
1 A 50
2 B 40
You could also create the substring column as a new column and group by that if you want to keep the original data as it is.
starts_with
is used in select
and rename
and are operations on the columnnames not on the values in the columns. By using gsub
you can extract the first letter (or word) and then summarise. With:
sumdf <- mydf %>%
group_by(Name = gsub("[^A-Za-z0-9].*", "", Name)) %>%
summarise(sumScore = sum(Score))
you get:
> sumdf
Name sumScore
1 A 50
2 B 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With