Group categories in R according to first letters of a string?

Question

I have a dataset loaded in R, and I have one of the columns that has text. This text is not unique (any row can have the same value) but it represents a specific condition of a row, and so the first 3-5 letters of this field will represent the group where the row belongs. Let me explain with an example.

Having 3 different rows, only showing the id and the column I need to group by:

ID........... TEXTFIELD    
1............ VGH2130    
2............ BFGF2345    
3............ VGH3321

Having the previous example, I would like to create a new column in the dataframe where it would be set the group such as

ID........... TEXTFIELD........... NEWCOL    
1............ VGH2130............. VGH    
2............ BFGF2345............ BFGF    
3............ VGH3321............. VGH

And to determine the groups that would be formed in this new column, I would like to make an array with the possible groups to make (since all the rows will be contained in one of these groups) (for example c <- ("VGH","BFGF",......) )

Can anyone drop any light on how to efficiently do this? (without making a for loop having to do this, since I have millions of rows and this would take ages)

Prasanna Nandakumar · Accepted Answer

You can also try

> data$group <- (str_extract(TEXTFIELD, "[aA-zZ]+"))
> data
  ID TEXTFIELD group
1  1   VGH2130   VGH
2  2  BFGF2345  BFGF
3  3   VGH3321   VGH

Cath · Answer

you can try, if df is your data.frame:

df$NEWCOL <- gsub("([A-Z)]+)\d+.*","\1", df$TEXTFIELD)

> df
#  ID TEXTFIELD NEWCOL
#1  1   VGH2130    VGH
#2  2  BFGF2345   BFGF
#3  3   VGH3321    VGH

Group categories in R according to first letters of a string?

Tags:

text

r

dataset

grouping

heythatsmekri

2 Answers

Prasanna Nandakumar

Cath

Recent Activity

Donate For Us

Group categories in R according to first letters of a string?

Tags:

text

r

dataset

grouping

heythatsmekri

2 Answers

Prasanna Nandakumar

Cath

Related questions

Recent Activity

Donate For Us