Regular expression to grab word before a certain character R Perl

Q: How do I match a character in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

Q: What is \W in Perl regex?

A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).

Q: How do you match a character sequence in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Tags:

regex

r

perl

I need to get words before and after a unique character (in my case: &) in a string in R.

I need to get 'word1' from something like this: "...something something word1 & word2 something..."

I can get the word after using a Perl regular expression in R: (?<=& )[^ ]*(?= ) (It seems to behave the way I would like. I got it from combing answers I found on this site)

I now need to get the word preceding the & symbol. The length of the words change and the number of other preceding words, and also spaces, change. Word one could be letters and numbers, just bound by spaces on either side.

964

asked Feb 19 '13 00:02

GregS

1 Answers

If you use (\S+)\s*&\s*(\S+) then the words both sides of & will be captured. This allows for optional whitespace around the ampersand.

You need to double-up the backslashes in an R string, and use the regexec and regmatches functions to apply the pattern and extract the matched substrings.

Click to copy

string  <- "...something something word1 & word2 something..."
pattern <- "(\\S+)\\s*&\\s*(\\S+)"
match   <- regexec(pattern, string)
words   <- regmatches(string, match)

Now words is a one-element list holding a three-item vector: the whole matched string followed by the first and second backreferences. So words[[1]][2] is word1 and words[[1]][3] is word2.

127

answered Nov 15 '22 22:11

Borodin

Related questions
                            
                                Python Title Case, but leave pre-existing uppercase
                            
                                Javascript regex Currency symbol in a string
                            
                                latest Perl won't match certain regexes more than 32768 characters long
                            
                                How to convert camel case to snake case with two capitals next to each other
                            
                                Change any number of delimiters in found pattern with sed
                            
                                preg_replace out CSS comments?
                            
                                How to parse Apache logs using a regex in PHP
                            
                                In Linux, how to copy all the files not starting with a given string?
                            
                                regex to remove ordinals
                            
                                Regex match newline in textarea
                            
                                Regex - Why doesn't this .* (dot-star) match line-breaks? [duplicate]
                            
                                Regex to strip \r and \n or \r\n
                            
                                regex to match variable declaration in java
                            
                                Check for camel case in Python
                            
                                regular expressions replace in iOS
                            
                                Java Regex remove new lines, but keep spaces.
                            
                                Morphia mongoDB wildcard query
                            
                                Python regex: Including whitespace inside character range
                            
                                Shell command to strip out ^M characters from text file [duplicate]
                            
                                regular expression to strip leading characters up to first encountered digit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With