I have names from a property database mostly organized by "LastName FirstName MI", but some have husband and wife names and go "LastName FirstName & FirstName", and I want to split that into two rows. An example I found here get's me started, but I can't figure out how to get it to work, and it's also downvoted into the negatives.
This is the data and where I'm at:
Id = c("id1", "id2", "id3", "id4", "id5", "id6", "id7")
Name = c("Berry Marry & Paul", "Horrowitz Anthony", "Lawrence Jennifer & Chris", "Jones John", "Rover Red & Clifford", "Jagger Mick", "Arthur King & Gweniverre")
df = data.frame(Id, Name)
cbind(df,t(sapply(str_split(df$Name, " & "), unlist))) %>%
gather(newProd, values,`1`,`2`) %>%
arrange(key, values)
This is what I want:
Id Name
1 id1 Berry Marry
2 id1 Berry Paul
3 id2 Horrowitz Anthony
4 id3 Lawrence Jennifer
5 id3 Lawrence Chris
6 id4 Jones John
7 id5 Rover Red
8 id5 Rover Clifford
9 id6 Jagger Mick
10 id7 Arthur King
11 id7 Arthur Gweniverre
Advice always appreciated, thanks!
Here is a readable way to do it. You can then drop the LastName and FirstName columns, but I left them in to show how it works.
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(LastName = word(Name, 1),
FirstName = word(Name, 2, -1)) %>%
separate_rows(FirstName, sep = " & ") %>%
mutate(Name = paste(LastName, FirstName))
Which gives:
# A tibble: 11 x 4
Id Name LastName FirstName
<chr> <chr> <chr> <chr>
1 id1 Berry Marry Berry Marry
2 id1 Berry Paul Berry Paul
3 id2 Horrowitz Anthony Horrowitz Anthony
4 id3 Lawrence Jennifer Lawrence Jennifer
5 id3 Lawrence Chris Lawrence Chris
6 id4 Jones John Jones John
7 id5 Rover Red Rover Red
8 id5 Rover Clifford Rover Clifford
9 id6 Jagger Mick Jagger Mick
10 id7 Arthur King Arthur King
11 id7 Arthur Gweniverre Arthur Gweniverre
How about this:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.1.2
library(stringr)
Id = c("id1", "id2", "id3", "id4", "id5", "id6", "id7")
Name = c("Berry Marry & Paul", "Horrowitz Anthony A", "Lawrence Jennifer & Chris", "Jones John", "Rover Red & Clifford", "Jagger Mick", "Arthur King & Gweniverre")
df = data.frame(Id, Name)
df %>%
mutate(Name = gsub(" & ", "&", Name, fixed=TRUE),
Name = gsub("(.*)\\s([A-Z])$", "\\1_\\2", Name)) %>%
separate(Name, "\\s", into = c("last", "first")) %>%
mutate(first = str_split(first, "&")) %>%
unnest(first) %>%
mutate(first = gsub("_", " ", first))
#> # A tibble: 11 × 3
#> Id last first
#> <chr> <chr> <chr>
#> 1 id1 Berry Marry
#> 2 id1 Berry Paul
#> 3 id2 Horrowitz Anthony A
#> 4 id3 Lawrence Jennifer
#> 5 id3 Lawrence Chris
#> 6 id4 Jones John
#> 7 id5 Rover Red
#> 8 id5 Rover Clifford
#> 9 id6 Jagger Mick
#> 10 id7 Arthur King
#> 11 id7 Arthur Gweniverre
The code replaces " & " with "&", so the only spaces left are between the last name and the first, first name. Then it splits the last name from the first names. Finally, it splits the first names into a list and then uses unnest to turn each element into a row.
Created on 2022-03-28 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With