Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R getting substrings and regular expressions?

Tags:

string

regex

r

I have a set of strings that are file names. I want to extract all characters after the # symbol but before the file extension. For example, one of the file names is:

HelloWorld#you.txt

I would want to return the stringyou

Here is my code:

    hashPos = grep("#", name, fixed=TRUE)
    dotPos = length(name)-3
    finalText = substring(name, hashPos, dotPos)

I read online that grep is supposed to return the index where the first parameter occurs (in this case the # symbol). So, I was expecting the above to work but it does not.

Or how would I use a regular expression to extract this string? Also, what happens when the string does not have a # symbol? Would the function return a special value such as -1?

like image 351
CodeKingPlusPlus Avatar asked Mar 15 '13 00:03

CodeKingPlusPlus


1 Answers

Here is a one-liner solution

gsub(".*\\#(.*)\\..*", "\\1", c("HelloWorld#you.txt"))

Output:

you

To explain the code, it matches everything up to # and then extracts all word characters up to ., so the final output will be the in-between string which what you are looking for.

Edit:

The above solution matches file name up to the last . i.e. allow file name to have multiple dots. If you want to extract the name up to the first . you can use the regex .*\\#(\\w*)\\..* instead.

like image 111
iTech Avatar answered Nov 15 '22 22:11

iTech