Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract year from string and append to dataframe

Tags:

r

I have a wine dataset with a column called "title" which contains the title of the wine including its vintage year. Refer sample:

  • Pull 2013 Chardonnay (Paso Robles)
  • R2 2013 Camp 4 Vineyard Grenache Blanc (Santa Ynez Valley)

I want to extract just the year in the strings i.e. 2013, and not the rest of the number in the string e.g. 2, 4.

I got to this part:

Extract vintage year from title column

wine_tidy2$vintage_year <- as.list(str_extract_all(wine_tidy2$title, "[0-9]+"))

But how do I exclude other numbers that are not part of the year?

I want to append the result to a data frame. With the above code, it adds the resulting list to the data frame, how can I add to the data frame as another column of integer?

Thank you.

like image 793
azmirfakkri Avatar asked Apr 20 '26 21:04

azmirfakkri


1 Answers

you can use sub() or regexec() from base by searching for numbers with have 4 digits:

string <- c('R2 2013 Camp 4 Vineyard Grenache Blanc', 'Santa Ynez Valley 1999', 'dsdd 2015')
sub("^.*([0-9]{4}).*", "\\1", string)
unlist(regmatches(string, regexec("[0-9]{4}", string)))

for your case:

# create a helper function
yearExtract <- function(string) {
  t <- regmatches(string, regexec("[0-9]{4}", string))
  sapply(t, function(x) {
    if(length(x) > 0){
      return(as.numeric(x))
    } else {
      return(NA)    
    }
  })
}


# create data.frame
title <- c('R2 2013 Camp 4 Vineyard Grenache Blanc', 'Santa Ynez Valley 1999', 'dsdd 15')
distributor <- c('a', 'b', 'd')
wine_tidy2 <- data.frame(title, distributor)

wine_tidy2$vintage_year <- yearExtract(as.character(wine_tidy2$title))
like image 96
and-bri Avatar answered Apr 22 '26 10:04

and-bri