I need to remove everything after the question mark in a column.
I have a data set EX:
my.data
BABY MOM LANDING
mark dina www.example.com/?kdvhzkajvkadjf
tom becky www.example.com/?ghkadkho[qeu
brad tina www.example.com/?klsdfngal;j
I want my new data to be:
new.data
BABY MOM LANDING
mark dina www.example.com/?
tom becky www.example.com/?
brad tina www.example.com/?
How do I tell R to remove everything after the ? in my.data$LANDING ?
We can use sub to remove the characters that are after ?. We use a positive lookbehind ((?<=\\?).*) to match one or more character (.) that is preceded by ? and replace it with ''.
my.data$LANDING <- sub('(?<=\\?).*$', '', my.data$LANDING, perl=TRUE)
my.data
# BABY MOM LANDING
#1 mark dina www.example.com/?
#2 tom becky www.example.com/?
#3 brad tina www.example.com/?
Or another option would be to use capture groups and then replace the second argument with the capture group (\\1).
my.data$LANDING <- sub('([^?]+\\?).*', '\\1', my.data$LANDING)
Here, we match all characters that are not ? ([^?]+) followed by ? (\\?) and use parentheses to capture as a group (([^?]+\\?)), and we leave the rest of characters not in the group (.*).
Or as @Frank mentioned in the comments, we can match the ? and the rest of the characters (.*), and replace it by \\? as the second argument.
my.data$LANDING <- sub("\\?.*","\\?",my.data$LANDING)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With