Group data frame by pattern in R

Question

I have R data frame with hundreds of rows as

word        Freq
seed         4
seeds        3
contract     2
contracting  2
river        1

I would like to group the data by patterns, say seed + seeds ... that looks like

word     Freq
seed      7
contract  4
river     1

jazzurro · Accepted Answer

Here is potentially another way to go. In the SnowballC package, there is a function which cleans up words and get word stems (i.e, wordStem()). Using that, you can skip string manipulation, I think. Once you get this process done, all you do is to get sum of word frequency.

library(SnowballC)
library(dplyr)

mydf <- read.table(text = "word        Freq
seed         4
seeds        3
contract     2
contracting  2
river        1", header = T)

mutate(mydf, word = wordStem(word)) %>%
group_by(word) %>%
summarise(total = sum(Freq))

#      word total
#     (chr) (int)
#1 contract     4
#2    river     1
#3     seed     7

akrun · Answer

One option would be to create a grouping variable 'gr' by extracting substring based on the minimum number of characters in 'word', do this one more with 'word' sp that we can get the substring for each group of words, and then get the sum of 'Freq' by 'word'.

library(dplyr)
 df1 %>% 
    group_by(gr= substr(word, 1, min(nchar(word)))) %>%
    group_by(word= substr(word, 1, min(nchar(word)))) %>%
    summarise(Freq= sum(Freq)) 
    word  Freq
#      (chr) (int)
#1 contract     4
#2    river     1
#3     seed     7

bramtayl · Answer

Can also do with cross-join, which is a little bit safer than the above method.

library(dplyr)
library(stringi)

df %>%
  merge(df %>% select(short_word = word) ) %>%
  filter(short_word %>%
           stri_detect_regex(word, .) ) %>%
  group_by(word) %>%
  slice(short_word %>% stri_length %>% which.min) %>%
  group_by(short_word) %>%
  summarise(Freq= sum(Freq))

Group data frame by pattern in R

Tags:

pattern-matching

r

aggregate

Samuel Shamiri

3 Answers

jazzurro

akrun

bramtayl

Recent Activity

Donate For Us

Group data frame by pattern in R

Tags:

pattern-matching

r

aggregate

Samuel Shamiri

3 Answers

jazzurro

akrun

bramtayl

Related questions

Recent Activity

Donate For Us