I have a df that looks like this:
df <- data.frame(
x = c(
"800 Block of MAIN ST",
"100 Block of CHESTNUT AV",
"BAY ST / WELLINGTON ST",
"LARKIN ST / ELLIS ST",
"MAPLE ST / WELLINGTON ST",
"MEANDERING RD / MAIN ST"),
y = rnorm(6))
I want to extract the first street name and the last street type.
Desired Output:
x y x.1 x.2
1 800 Block of MAIN ST -0.6745405 MAIN ST
2 100 Block of CHESTNUT AV -1.1316017 CHESTNUT AV
3 BAY ST / WELLINGTON ST 1.2887577 BAY ST
4 LARKIN ST / ELLIS ST 1.4606264 LARKIN ST
5 MAPLE ST / WELLINGTON ST 0.6538595 MAPLE ST
6 MEANDERING RD / MAIN ST 0.8472322 MEANDERING ST
library(stringr)
df[,c("street", "type")] <- list(str_extract(df$x, "[A-Z]{3,}"), str_extract(df$x, "[A-Z]+$"))
# x y street type
# 1 800 Block of MAIN ST 0.7787495 MAIN ST
# 2 100 Block of CHESTNUT AV -0.7069777 CHESTNUT AV
# 3 BAY ST / WELLINGTON ST -0.2365061 BAY ST
# 4 LARKIN ST / ELLIS ST 0.1399500 LARKIN ST
# 5 MAPLE ST / WELLINGTON ST -0.3423978 MAPLE ST
# 6 MEANDERING RD / MAIN ST 0.6434969 MEANDERING ST
df <- within(df, st_name <- sub(".*?([A-Z]{3,}).*", "\\1", x, perl=TRUE))
df <- within(df, st_type <- sub(".+? ([A-Z]+)$", "\\1", x, perl=TRUE))
# x y st_name st_type
#1 800 Block of MAIN ST 1.92908789 MAIN ST
#2 100 Block of CHESTNUT AV 0.02487045 CHESTNUT AV
#3 BAY ST / WELLINGTON ST -2.33411242 BAY ST
#4 LARKIN ST / ELLIS ST -1.17946144 LARKIN ST
#5 MAPLE ST / WELLINGTON ST 0.12913797 MAPLE ST
#6 MEANDERING RD / MAIN ST -0.94150930 MEANDERING ST
Or if you aren't fond of using within:
df$st_name <- sub(".*?([A-Z]{3,}).*", "\\1", df$x, perl=TRUE)
df$st_type <- sub(".+? ([A-Z]+)$", "\\1", df$x, perl=TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With