Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R- regexp question

Tags:

regex

r

I need to re-shape my data frame using regexp and, in particular, this kind of line

X21_GS04.A.mzdata

must became:

GS04.A

I tryed

pluto <- sub('^X[0-90_]+','', my.data.frame$File.Name, perl=TRUE)

and it works; than I tryed

pluto <- sub('.mzdata$','', my.data.frame$File.Name, perl=TRUE)

and it works too.

The problem is that I have no idea how to combine the two code in one, I tryed a script such this

pluto <- sub('^X[0-90_]+ | .mzdata$','', my.data.frame$File.Name, perl=TRUE)

but nothing appens. Can someone say to me where I wrong??

Best Riccardo

like image 584
Riccardo Avatar asked Jul 22 '11 09:07

Riccardo


1 Answers

The regular expression you’re after is this:

^X\d+_(.*)\.mzdata$

This will match your whole expression and capture the part that you want to retain in a group. You can now replace this by \1 (a reference to the capture group).

In R, this would be:

result <- sub('^X\\d+_(.*)\\.mzdata$', '\\1', my.data.frame$File.Name, perl=TRUE)
like image 77
Konrad Rudolph Avatar answered Oct 27 '22 03:10

Konrad Rudolph