Remove words of length less than 4 from string [duplicate]

Question

I am trying to remove words of length less than 4 from a string.

I use this regex:

 re.sub(' \w{1,3} ', ' ', c)

Though this removes some strings but it fails when 2-3 words of length less than 4 appear together. Like:

 I am in a bank.

It gives me:

 I in bank.

How to resolve this?

Martijn Pieters · Accepted Answer

Don't include the spaces; use \b word boundary anchors instead:

re.sub(r'\b\w{1,3}\b', '', c)

This removes words of up to 3 characters entirely:

>>> import re
>>> re.sub(r'\b\w{1,3}\b', '', 'The quick brown fox jumps over the lazy dog')
' quick brown  jumps over  lazy '
>>> re.sub(r'\b\w{1,3}\b', '', 'I am in a bank.')
'    bank.'

Vidhya G · Answer

If you want an alternative to regex:

new_string = ' '.join([w for w in old_string.split() if len(w)>3])

Sizik · Answer

Answered by Martijn, but I just wanted to explain why your regex doesn't work. The regex string ' \w{1,3} ' matches a space, followed by 1-3 word characters, followed by another space. The I doesn't get matched because it doesn't have a space in front of it. The am gets replaced, and then the regex engine starts at the next non-matched character: the i in in. It doesn't see the space before in, since it was placed there by the substitution. So, the next match it finds is a, which produces your output string.

Remove words of length less than 4 from string [duplicate]

Tags:

python

regex

blackmamba

3 Answers

Martijn Pieters

Vidhya G

Sizik

Recent Activity

Donate For Us

Remove words of length less than 4 from string [duplicate]

Tags:

python

regex

blackmamba

3 Answers

Martijn Pieters

Vidhya G

Sizik

Related questions

Recent Activity

Donate For Us