I have a df (day.df) with the column vial which I am trying to split in to four new columns. The new columns will be treatment gender line block. The day.df dataframe also has the columns response & explanatory which will be retained.
So day.df currently looks like this (top 4 of 31000 rows):
    vial    response explanatory
    Xm1.1   0        4
    Xm2.1   0        4
    Xm3.1   0        4
    Xm4.1   0        4
    .       .        .
    .       .        .        
    .       .        .
The current contents of the vial column look like this.. Xm1.2. 
X or A - this will be the
treament.  m) can be m or
f- this is the gender.1)    and ranges from 1-40 - this
is the line.block and ranges from 1-4
As such the new day.df will look something like this (I use four "random" rows to illustrate the variation within each new column):
        vial    response explanatory  treatment gender line  block
        Xm1.1   0        4            X         m      1     1
        Am1.1   0        4            A         m      1     1
        Xf3.2   0        4            X         f      3     2
        Xm4.2   0        4            X         m      4     2
        .       .        .
        .       .        .        
        .       .        .
I've taken a look around online for how to do this and this is the closest I came; I tried to split the vial column like this...  
 > a=strsplit(day.df$vial,"")
 > a[1] "Xm1.2"
but had problems when the "line" section of the string went >9 because then two character were there, e.g (for the row where vial is Af20.2).
 > a[300]
 [[1]]
 [1] "A" "f" "2" "0" "." "2"
Should read as:
 > a[300]
 [[1]]
 [1] "A" "f" "20" "." "2"
So the steps I need help solving are:
line section of the string when over 9.day.df dataframe in the four required columnsMethod 1: Using strsplit() function strsplit() function is used to split the string based on some condition.
To split a column into multiple columns in the R Language, We use the str_split_fixed() function of the stringr package library. The str_split_fixed() function splits up a string into a fixed number of pieces.
using gsub and strsplit like this :
v <- c('Xm1.1','Xf3.2')
h <- gsub('(X|A)(m|f)([0-9]{1,2})[.]([1-4])','\\1|\\2|\\3|\\4',v)
do.call(rbind,strsplit(h,'[|]'))
    [,1] [,2] [,3] [,4]
[1,] "X"  "m"  "1"  "1" 
[2,] "X"  "f"  "3"  "2" 
the result it is a data.frame, you can cbind it to your original data.frame.
EDIT @GriffinEvo Applied & tested code:
 a = gsub('(X|A)(m|f)([0-9]{1,2})[.]([1-4])',
           '\\1|\\2|\\3|\\4',day.df$vial) 
 do.call(rbind, strsplit(a,'[|]') )
 day.df = cbind(day.df,do.call(rbind,strsplit(a,'[|]'))) 
 colnames(day.df)[4:7] = c ("treatment" , "gender" , "line" , "block")
                        Read the data:
Lines <- "vial    response explanatory
    Xm1.1   0        4
    Xm2.1   0        4
    Xm3.1   0        4
    Xm4.1   0        4
"
day.df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
1) then process it using strapplyc.  (we used as.is=TRUE so that day.df$vial is character but if its a factor in your data frame then replace day.df$Vial with as.character(day.df$vial). )  This approach does the parsing in just one short line of code:
library(gsubfn)    
s <- strapplyc(day.df$vial, "(.)(.)(\\d+)[.](.)", simplify = rbind)
# we can now cbind it to the original data frame
colnames(s) <- c("treatment", "gender", "line", "block")
cbind(day.df, s)
which gives:
  vial response explanatory treatment gender line block
1 Xm1.1        0           4         X      m    1     1
2 Xm2.1        0           4         X      m    2     1
3 Xm3.1        0           4         X      m    3     1
4 Xm4.1        0           4         X      m    4     1
2) Here is a different approach. This does not use any packages and is relatively simple (no regular expressions at all) and only involves one R statement including the cbind'ing:
transform(day.df,
 treatment = substring(vial, 1, 1),        # 1st char
 gender = substring(vial, 2, 2),           # 2nd char
 line = substring(vial, 3, nchar(vial)-2), # 3rd through 2 prior to last char
 block = substring(vial, nchar(vial)))     # last char
The result is as before.
UPDATE: Added second approach.
UPDATE: Some simplifications.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With