Say I have a vector of peoples' names in my dataframe:
names <- c("Bernice Ingram", "Dianna Dean", "Philip Williamson", "Laurie Abbott",
"Rochelle Price", "Arturo Fisher", "Enrique Newton", "Sarah Mann",
"Darryl Graham", "Arthur Hoffman")
I want to create a vector with the first names. All I know about them is that they come first in the vector above and that they're followed by a space. In other words, this is what I'm looking for:
"Bernice" "Dianna" "Philip" "Laurie" "Rochelle"
"Arturo" "Enrique" "Sarah" "Darryl" "Arthur"
I've found a similar question here, but the answers (especially this one) haven't helped much. So far, I've tried a couple of variations of function from the grep
family, and the closest I could get to something useful was by running strsplit(names, " ")
to separate first names and then strsplit(names, " ")[[1]][1]
to get just the first name of the first person. I've been trying to tweak this last command to give me a whole vector of first names, to no avail.
Use sapply
to extract the first name:
> sapply(strsplit(names, " "), `[`, 1)
[1] "Bernice" "Dianna" "Philip" "Laurie" "Rochelle" "Arturo" "Enrique"
[8] "Sarah" "Darryl" "Arthur"
Some comments:
The above works just fine. To make it a bit more general you could change the split
parameter in strsplit
function from " "
in "\\s+"
which covers multiple spaces. Then you also could use gsub
to extract directly everything before a space. This last approach will use only one function call and likely to be faster (but I haven't check with benchmark).
For what you want, here's a pretty unorthodox way to do it:
read.table(text = names, header = FALSE, stringsAsFactors=FALSE, fill = TRUE)[[1]]
# [1] "Bernice" "Dianna" "Philip" "Laurie" "Rochelle" "Arturo" "Enrique" "Sarah"
# [9] "Darryl" "Arthur"
This seems to work:
unlist(strsplit(names,' '))[seq(1,2*length(names),2)]
Assuming no first/last names have spaces in them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With