Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression matching anything greater than eight letters in length, in Python

Tags:

Despite attempts to master grep and related GNU software, I haven't come close to mastering regular expressions. I do like them, but I find them a bit of an eyesore all the same.

I suppose this question isn't difficult for some, but I've spent hours trying to figure out how to search through my favorite book for words greater than a certain length, and in the end, came up with some really ugly code:

twentyfours = [w for w in vocab if re.search('^........................$', w)] twentyfives = [w for w in vocab if re.search('^.........................$', w)] twentysixes = [w for w in vocab if re.search('^..........................$', w)] twentysevens = [w for w in vocab if re.search('^...........................$', w)] twentyeights = [w for w in vocab if re.search('^............................$', w)] 

... a line for each length, all the way from a certain length to another one.

What I want instead is to be able to say 'give me every word in vocab that's greater than eight letters in length.' How would I do that?

like image 435
magnetar Avatar asked Aug 30 '10 20:08

magnetar


People also ask

Which regex matches one or more digits python?

You can use out\dmf\d+ , or, if you want to match only 1 or 2 digits at the end, out\dmf\d{1,2} .

How do you restrict length in regex?

By combining the interval quantifier with the surrounding start- and end-of-string anchors, the regex will fail to match if the subject text's length falls outside the desired range.

Which regex matches one or more digits?

+: one or more ( 1+ ), e.g., [0-9]+ matches one or more digits such as '123' , '000' . *: zero or more ( 0+ ), e.g., [0-9]* matches zero or more digits. It accepts all those in [0-9]+ plus the empty string.

What is the maximum length of regex?

A regular expression can be used on both a group trigger and a floating trigger. The maximum length of the regular expression is 250 bytes. If an asterisk is specified for the column, ACIF searches the entire record for the string that matches the regular expression.


2 Answers

You don't need regex for this.

result = [w for w in vocab if len(w) >= 8] 

but if regex must be used:

rx = re.compile('^.{8,}$') #                  ^^^^ {8,} means 8 or more. result = [w for w in vocab if rx.match(w)] 

See http://www.regular-expressions.info/repeat.html for detail on the {a,b} syntax.

like image 184
kennytm Avatar answered Sep 24 '22 19:09

kennytm


\w will match letter and characters, {min,[max]} allows you to define size. An expression like

\w{9,} 

will give all letter/number combinations of 9 characters or more

like image 40
Ivo van der Wijk Avatar answered Sep 20 '22 19:09

Ivo van der Wijk