Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove everything after the last underscore of a column in R [duplicate]

Tags:

regex

r

dplyr

I have a dataframe and for a particular column I want to strip out everything after the last underscore.

So:

test <- data.frame(label=c('test_test_test', 'test_tom_cat', 'tset_eat_food', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))

should become

result <- data.frame(label=c('test_test', 'test_tom', 'tset_eat', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))

I have got:

require(dplyr)
test %>%
  mutate(label = gsub('_.*','',label))

but that drops everything from the first underscore and gives me

 wrong_result <- data.frame(label=c('test', 'test', 'tset', 'tisk - tisk'), 
                   stuff=c('blah', 'blag', 'gah', 'nah') , 
                   numbers=c(1,2,3, 4))
like image 867
KillerSnail Avatar asked Nov 29 '16 04:11

KillerSnail


2 Answers

We can use sub and this can be done without any external packages

test$label <- sub("_[^_]+$", "", test$label)
test$label
#[1] "test_test"   "test_tom"    "tset_eat"    "tisk - tisk"
like image 91
akrun Avatar answered Nov 15 '22 15:11

akrun


This will also work:

gsub('(.*)_\\w+', '\\1', test$label)
#[1] "test_test"   "test_tom"    "tset_eat"    "tisk - tisk"
like image 29
Sandipan Dey Avatar answered Nov 15 '22 14:11

Sandipan Dey