I have a code like (I got it here): <pre class="prettyprint"><code>m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow") x<- gsub("\\<[a-z]\\{4,10\\}\\>","",m) x </code></pre> I tried other ways of doing it, like <pre class="prettyprint"><code>m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow") x<- gsub("[^(\\b.{4,10}\\b)]","",m) x </code></pre> I need to remove words which are lesser than 4 or greater than 10 in length. Where am I going wrong?

<pre class="prettyprint"><code> gsub("\\b[a-zA-Z0-9]{4,10}\\b", "", m) "! # is gr8. I likewhatishappening ! The of is ! the aforementioned is ! #Wow" </code></pre> Let's explain the regular expression terms : <ol> <li>\b matches at a position that is called a "word boundary". This match is zero-length.</li> <li>[a-zA-Z0-9] :alphanumeric</li> <li>{4,10} :{min,max}</li> </ol> if you want to get the negation of this so , you put it between() and you take //1 <pre class="prettyprint"><code>gsub("([\\b[a-zA-Z0-9]{4,10}\\b])", "//1", m) </code></pre> "Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow" It is funny to see that words with 4 letters exist in the 2 regexpr.

<pre class="prettyprint"><code># starting string m <- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow") # remove punctuation (optional) v <- gsub("[[:punct:]]", " ", m) # split into distinct words w <- strsplit( v , " " ) # calculate the length of each word x <- nchar( w[[1]] ) # keep only words with length 4, 5, 6, 7, 8, 9, or 10 y <- w[[1]][ x %in% 4:10 ] # string 'em back together z <- paste( unlist( y ), collapse = " " ) # voila z </code></pre>

Extracting Words of specific length in R using regular expressions

Tags:

string

regex

r

I have a code like (I got it here):

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("\\<[a-z]\\{4,10\\}\\>","",m)
x

I tried other ways of doing it, like

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("[^(\\b.{4,10}\\b)]","",m)
x

I need to remove words which are lesser than 4 or greater than 10 in length. Where am I going wrong?

935

asked Dec 10 '12 08:12

jackStinger

2 Answers

  gsub("\\b[a-zA-Z0-9]{4,10}\\b", "", m) 
 "! # is gr8. I  likewhatishappening ! The  of   is ! the aforementioned  is ! #Wow"

Let's explain the regular expression terms :

\b matches at a position that is called a "word boundary". This match is zero-length.
[a-zA-Z0-9] :alphanumeric
{4,10} :{min,max}

if you want to get the negation of this so , you put it between() and you take //1

gsub("([\\b[a-zA-Z0-9]{4,10}\\b])", "//1", m)

"Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow"

It is funny to see that words with 4 letters exist in the 2 regexpr.

186

answered Nov 05 '22 23:11

agstudy

# starting string
m <- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

# remove punctuation (optional)
v <- gsub("[[:punct:]]", " ", m)

# split into distinct words
w <- strsplit( v , " " )

# calculate the length of each word
x <- nchar( w[[1]] )

# keep only words with length 4, 5, 6, 7, 8, 9, or 10
y <- w[[1]][ x %in% 4:10 ]

# string 'em back together
z <- paste( unlist( y ), collapse = " " )

# voila
z

answered Nov 05 '22 22:11

Anthony Damico

Related questions
                            
                                Regex to match only two specific words, e.g. Yes or No
                            
                                Regex Pattern Catastrophic backtracking
                            
                                Backreference does not work in PHP
                            
                                Segmentation fault in std::transform
                            
                                Regular expression for file extensions in Java
                            
                                remove emoji from string in R
                            
                                ansible regex_search with variable
                            
                                Getting PEP8 "invalid escape sequence" warning trying to escape parentheses in a regex
                            
                                jQuery US Currency validation regEx to allow whole numbers as well
                            
                                awk extract multiple groups from each line
                            
                                findstr.exe is not working
                            
                                How to NOT match a word in mod_rewrite
                            
                                Regular expression for recognizing in-text citations
                            
                                Regex created via new RegExp(myString) not working (backslashes)
                            
                                Find specific link w/ beautifulsoup
                            
                                PHP preg_match bible scripture format
                            
                                Positive lookahead to match '/' or end of string
                            
                                Regex - Extract a substring from a given string
                            
                                Split string by all spaces except those in brackets [duplicate]
                            
                                JS: regex for numbers and spaces?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With