Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing everything after a character in a column in R

Tags:

r

character

I need to remove everything after the question mark in a column.

I have a data set EX:

my.data

BABY      MOM      LANDING
mark      dina     www.example.com/?kdvhzkajvkadjf
tom       becky    www.example.com/?ghkadkho[qeu
brad      tina     www.example.com/?klsdfngal;j

I want my new data to be:

new.data

BABY      MOM      LANDING
mark      dina     www.example.com/?
tom       becky    www.example.com/?
brad      tina     www.example.com/?

How do I tell R to remove everything after the ? in my.data$LANDING ?

like image 599
Ally Kat Avatar asked Aug 05 '15 15:08

Ally Kat


1 Answers

We can use sub to remove the characters that are after ?. We use a positive lookbehind ((?<=\\?).*) to match one or more character (.) that is preceded by ? and replace it with ''.

 my.data$LANDING <- sub('(?<=\\?).*$', '', my.data$LANDING, perl=TRUE)
 my.data
 #  BABY   MOM       LANDING
 #1 mark  dina www.example.com/?
 #2  tom becky www.example.com/?
 #3 brad  tina www.example.com/?

Or another option would be to use capture groups and then replace the second argument with the capture group (\\1).

 my.data$LANDING <- sub('([^?]+\\?).*', '\\1', my.data$LANDING)

Here, we match all characters that are not ? ([^?]+) followed by ? (\\?) and use parentheses to capture as a group (([^?]+\\?)), and we leave the rest of characters not in the group (.*).

Or as @Frank mentioned in the comments, we can match the ? and the rest of the characters (.*), and replace it by \\? as the second argument.

  my.data$LANDING <- sub("\\?.*","\\?",my.data$LANDING)
like image 140
akrun Avatar answered Nov 08 '22 16:11

akrun