Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separate a column into 2 columns at the last underscore in R

Tags:

r

tidyr

I have a dataframe like this

id <-c("1","2","3")
col <- c("CHB_len_SCM_max","CHB_brf_SCM_min","CHB_PROC_S_SV_mean")

df <- data.frame(id,col)

I want to create 2 columns by separating the "col" into the measurement and stat. stat is basically the text after the last underscore (max,min,mean, etc)

My desired output is

  id   Measurement stat
   1   CHB_len_SCM  max  
   2   CHB_brf_SCM  min   
   3 CHB_PROC_S_SV mean    

I tried it this way but the stat column in empty. I am not sure if I am pointing to the last underscore.

library(tidyverse)
df1 <- df %>%
  # Separate the sensors and the summary statistic
  separate(col, into = c("Measurement", "stat"),sep = '\\_[^\\_]*$')

What am I missing here? Can someone point me in the right direction?

like image 341
Sharath Avatar asked May 24 '18 21:05

Sharath


1 Answers

We could use extract by capturing as two groups by making sure that the second group have one or more characters that are not a _ until the end ($) of the string

library(tidyverse)
df %>% 
   extract(col, into = c("Measurement", "stat"), "(.*)_([^_]+)$")
#   id   Measurement stat
#1  1   CHB_len_SCM  max
#2  2   CHB_brf_SCM  min
#3  3 CHB_PROC_S_SV mean

Or using separate with a regex lookaround

df %>% 
   separate(col, into = c("Measurement", "stat"), sep="_(?=[^_]+$)")
like image 75
akrun Avatar answered Oct 19 '22 10:10

akrun