Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R split text string into last and first elements

Tags:

r

I'm looking for help how I can split complex names column into 2 columns for first and last names.

 df <- data.frame( PREFIX=c("A_B","A_C","A_D","B_A","A_B_C","B_D_E","C_B_A","B_A"),
         VALUE=c(1,2,3,4,5,6,7,8) )

The following produces the first part of the task, but I couldn't figure out how to select the last element when I have different pattern for the remaining string

 # split PREFIX into new columns
 df$name1 = as.character(lapply(strsplit(as.character(df$PREFIX), split="_"), "[", 1))
like image 684
user45367 Avatar asked May 05 '14 04:05

user45367


2 Answers

You can use tail to grab the last element:

df$name2 = as.character(lapply(strsplit(as.character(df$PREFIX), split="_"),
                        tail, n=1))
df
#   PREFIX VALUE name1 name2
# 1    A_B     1     A     B
# 2    A_C     2     A     C
# 3    A_D     3     A     D
# 4    B_A     4     B     A
# 5  A_B_C     5     A     C
# 6  B_D_E     6     B     E
# 7  C_B_A     7     C     A
# 8    B_A     8     B     A
like image 155
josliber Avatar answered Oct 27 '22 13:10

josliber


You can also use a "greedy" regular expression:

cbind(df, do.call(rbind, strsplit(as.character(df$PREFIX), "_|_.*_")))
#   PREFIX VALUE 1 2
# 1    A_B     1 A B
# 2    A_C     2 A C
# 3    A_D     3 A D
# 4    B_A     4 B A
# 5  A_B_C     5 A C
# 6  B_D_E     6 B E
# 7  C_B_A     7 C A
# 8    B_A     8 B A
like image 34
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 27 '22 14:10

A5C1D2H2I1M1N2O1R2T1