Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"last name, first name" -> "first name last name" in serialized strings

Tags:

string

regex

r

I have a bunch of strings that contain lists of names in last name, first name format, separated by commas, like so:

names <- c('Beaufoy, Simon, Boyle, Danny','Nolan, Christopher','Blumberg, Stuart, Cholodenko, Lisa','Seidler, David','Sorkin, Aaron')

What's the easiest way to convert all these names within the strings to first name last name format?

like image 820
RoyalTS Avatar asked Jan 23 '13 16:01

RoyalTS


3 Answers

If you can be certain that a comma isn't going to be in a person's name, this might work:

mynames <- c('Beaufoy, Simon, Boyle, Danny',
             'Nolan, Christopher',
             'Blumberg, Stuart, Cholodenko, Lisa',
             'Seidler, David',
             'Sorkin, Aaron',
             'Hoover, J. Edgar')
mynames2 <- strsplit(mynames, ", ")

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0])))
# [1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
# [4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
# [7] "Aaron Sorkin"      "J. Edgar Hoover"        

I've added J. Edgar Hoover in there for good measure.

If you want the names that were quoted together to stay together, add collapse = ", " to your paste() function:

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0],
                                collapse = ", ")))
# [1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
# [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
# [5] "Aaron Sorkin"                     "J. Edgar Hoover"    
like image 68
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 05 '22 13:11

A5C1D2H2I1M1N2O1R2T1


(1) Maintain same names in each element This can be done with a single gsub (assuming there are no commas within names):

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", names)
[1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
[3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
[5] "Aaron Sorkin"    

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", "Hoover, J. Edgar")
[1] "J. Edgar Hoover"

(2) Separate into one name per element If you wanted each first name last name in a separate element then use (a) scan

scan(text = out, sep = ",", what = "")

where out is the result of the gsub above or to get it directly try (b) strapply:

> library(gsubfn)
> strapply(names, "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), simplify = c)
[1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
[4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
[7] "Aaron Sorkin"     

> strapply("Hoover, Edgar J.", "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), 
+   simplify = c)
[1] "Edgar J. Hoover"

Note that all examples above used the same regular expression for matching.

UPDATE: removed comma separating first and last name.

UPDATE: added code to separate out each first name last name into a separate element in case that is the preferred output format.

like image 41
G. Grothendieck Avatar answered Nov 05 '22 14:11

G. Grothendieck


I'm in favor of @AnandaMahto's Answer, but just for fun, this illustrates another method using scan, split, and rapply.

names <- c(names, 'Chambers, John, Ihaka, Ross, Gentleman, Robert')

# extract names
snames <- 
lapply(names, function(x) scan(text=x, what='', sep=',', strip.white=TRUE, quiet=TRUE))

# break up names
snames<-lapply(snames, function(x) split(x, rep(seq(length(x) %/% 2), each=2)))

# collapse together, reversed
rapply(snames, function(x) paste(x[2:1], collapse=' '))
like image 1
Matthew Plourde Avatar answered Nov 05 '22 15:11

Matthew Plourde