Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split colon- and equal-separated string into different columns in R

Tags:

regex

r

I have a dataframe , a column of which contains colon and equal-separated strings.

data$col1
  [1] "ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45"  
  [2] "ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05"  
  [3] "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82"
  [4] "DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03" 
  [5] "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37"    
  [6] "ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29" 

I want to extract the numbers follow the NLOD= and TLOD=, and then split it into two columns. Here is the output I want.

data
                                                        col1     TLOD      NLOD
    "ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45"     5.45     38.78
    "ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05"     4.05     36.58
  "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82"     5.82     20.42
   "DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03"     8.03     30.70
      "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37"     5.37     41.48
      "ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29"     5.29     40.59

Any help is appreciated. Thank you.

Reproducible sample data

structure(list(col1 = c("ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45", 
"ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05", "DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82", 
"DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03", "ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37", 
"ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
like image 632
Lin Caijin Avatar asked Nov 17 '25 02:11

Lin Caijin


1 Answers

In base R, you can use strcapture to capture the data into separate columns.

cbind(df, strcapture('NLOD=(.*?);TLOD=(.*)', df$col1, 
           proto = list(NLOD = numeric(), TLOD = numeric())))

#.                                                     col1  NLOD TLOD
#1   ECNT=2;HCNT=4;MAX_ED=51;MIN_ED=51;NLOD=38.78;TLOD=5.45 38.78 5.45
#2   ECNT=2;HCNT=8;MAX_ED=51;MIN_ED=51;NLOD=36.58;TLOD=4.05 36.58 4.05
#3 DB;ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=20.42;TLOD=5.82 20.42 5.82
#4  DB;ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=30.70;TLOD=8.03 30.70 8.03
#5     ECNT=2;HCNT=6;MAX_ED=7;MIN_ED=7;NLOD=41.48;TLOD=5.37 41.48 5.37
#6     ECNT=2;HCNT=9;MAX_ED=7;MIN_ED=7;NLOD=40.59;TLOD=5.29 40.59 5.29

To look specifically for numbers you can do :

cbind(df, strcapture('NLOD=(\\d+\\.\\d+);TLOD=(\\d+\\.\\d+)', df$col1, 
           proto = list(NLOD = numeric(), TLOD = numeric())))

Same regex can also be used in tidyr::extract :

tidyr::extract(df, col1, c('NLOD', 'TLOD'), 'NLOD=(.*?);TLOD=(.*)', remove = FALSE)
like image 176
Ronak Shah Avatar answered Nov 19 '25 15:11

Ronak Shah