Here is my data set:
FullName <- c("Jimmy John Cephus", "Frank Chester", "Hank Chester", "Brody Buck Clyde", "Merle Rufus Roscoe Jed Quaid")
df <- data.frame(FullName)
Goal: Look into FullName for any spaces, " ", and extract out the FirstName.
My first step is to utilize the stringr library because I will utilize the str_count() and word() functions.
Next I test stringr::str_count(df$FullName, " ")
against the df and R returns:
[1] 2 1 1 2 4
This is what I expect.
Next I test the word() function:
stringr::word(df$FullName, 1)
R returns:
[1] "Jimmy" "Frank" "Hank" "Brody" "Merle"
Again, this is what I expect.
Next I construct a simple UDF (user defined function) that incorporates the str_count() function:
split_firstname = function(full_name){
x <- stringr::str_count(full_name, " ")
return(x)
}
split_firstname(df$FullName)
Again, R provides what I expect:
[1] 2 1 1 2 4
As a final step, I incorporate the word() function into the UDF and code for all of the conditions:
split_firstname = function(full_name){
x <- stringr::str_count(full_name, " ")
if(x==1){
return(stringr::word(full_name,1))
}else if(x==2){
return(paste(stringr::word(full_name,1), stringr::word(full_name,2), sep = " "))
}else if(x==4){
return(paste(stringr::word(full_name,1), stringr::word(full_name,2), stringr::word(full_name,3), stringr::word(full_name,4), sep = " "))
}
}
Then I call the UDF and pass to it the FullName from the df:
split_firstname(df$FullName)
This time I did NOT get what I expected, R returned:
[1] "Jimmy John" "Frank Chester" "Hank Chester" "Brody Buck" "Merle Rufus"
Warning messages:
1: In if (x == 1) { :
the condition has length > 1 and only the first element will be used
2: In if (x == 2) { :
the condition has length > 1 and only the first element will be used
I had expected R to return to me the following:
"Jimmy John", "Frank", "Hank", "Brody Buck", "Merle Rufus Roscoe Jed"
the problem is that you are using an if-statement with a vector. This is not allowed and doesn't work as you would expect. You can use the case_when
function from dplyr
.
library(dplyr)
split_firstname <- function(full_name){
x <- stringr::str_count(full_name, " ")
case_when(
x == 1 ~ stringr::word(full_name, 1),
x == 2 ~ paste(stringr::word(full_name,1), stringr::word(full_name,2), sep = " "),
x == 4 ~ paste(stringr::word(full_name,1), stringr::word(full_name,2), stringr::word(full_name,3), stringr::word(full_name,4), sep = " ")
)
}
lukeA's answer is the best approach, but if you find you are unable to vectorise functions, sapply from base-r and rowwise from dplyr can solve this problem too
df$first <- sapply(df$FullName, split_firstname)
head(df)
FullName first
1 Jimmy John Cephus Jimmy John
2 Frank Chester Frank
3 Hank Chester Hank
4 Brody Buck Clyde Brody Buck
5 Merle Rufus Roscoe Jed Quaid Merle Rufus Roscoe Jed
library(dplyr)
df <- df %>% rowwise() %>%
mutate(split2 = split_firstname(FullName))
head(df)
FullName first split2
<fctr> <chr> <chr>
1 Jimmy John Cephus Jimmy John Jimmy John
2 Frank Chester Frank Frank
3 Hank Chester Hank Hank
4 Brody Buck Clyde Brody Buck Brody Buck
5 Merle Rufus Roscoe Jed Quaid Merle Rufus Roscoe Jed Merle Rufus Roscoe Jed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With