I want to use a regex to extract all URLs from text in a dataframe, into a new column. I have some older code that I have used to extract keywords, so I'm looking to adapt the code for a regex. I want to save a regex as a string variable and apply here:
data$ContentURL <- apply(sapply(regex, grepl, data$Content, fixed=FALSE), 1, function(x) paste(selection[x], collapse=','))
It seems that fixed=FALSE should tell grepl that its a regular expression, but R doesn't like how I am trying to save the regex as:
regex <- "http.*?1-\\d+,\\d+"
My data is organized in a data frame like this:
data <- read.table(text='"Content"     "date"   
 1     "a house a home https://www.foo.com"     "12/31/2013"
 2     "cabin ideas https://www.example.com in the woods"     "5/4/2013"
 3     "motel is a hotel"   "1/4/2013"', header=TRUE)
And would hopefully look like:
                                           Content       date              ContentURL
1               a house a home https://www.foo.com 12/31/2013     https://www.foo.com
2 cabin ideas https://www.example.com in the woods   5/4/2013 https://www.example.com
3                                 motel is a hotel   1/4/2013                        
                Hadleyverse solution (stringr package) with a decent URL pattern:
library(stringr)
url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
data$ContentURL <- str_extract(data$Content, url_pattern)
data
##                                            Content       date              ContentURL
## 1               a house a home https://www.foo.com 12/31/2013     https://www.foo.com
## 2 cabin ideas https://www.example.com in the woods   5/4/2013 https://www.example.com
## 3                                 motel is a hotel   1/4/2013                    <NA>
You can use str_extract_all if there are multiples in Content, but that will involve some extra processing on your end afterwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With