I need to remove everything after the question mark in a column.
I have a data set EX:
my.data
BABY MOM LANDING
mark dina www.example.com/?kdvhzkajvkadjf
tom becky www.example.com/?ghkadkho[qeu
brad tina www.example.com/?klsdfngal;j
I want my new data to be:
new.data
BABY MOM LANDING
mark dina www.example.com/?
tom becky www.example.com/?
brad tina www.example.com/?
How do I tell R to remove everything after the ?
in my.data$LANDING
?
We can use sub
to remove the characters that are after ?
. We use a positive lookbehind ((?<=\\?).*
) to match one or more character (.
) that is preceded by ?
and replace it with ''
.
my.data$LANDING <- sub('(?<=\\?).*$', '', my.data$LANDING, perl=TRUE)
my.data
# BABY MOM LANDING
#1 mark dina www.example.com/?
#2 tom becky www.example.com/?
#3 brad tina www.example.com/?
Or another option would be to use capture groups
and then replace the second argument with the capture group (\\1
).
my.data$LANDING <- sub('([^?]+\\?).*', '\\1', my.data$LANDING)
Here, we match all characters that are not ?
([^?]+
) followed by ?
(\\?
) and use parentheses to capture as a group (([^?]+\\?)
), and we leave the rest of characters not in the group (.*
).
Or as @Frank mentioned in the comments, we can match the ?
and the rest of the characters (.*
), and replace it by \\?
as the second argument.
my.data$LANDING <- sub("\\?.*","\\?",my.data$LANDING)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With