Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find word not surrounded by alpha char

Tags:

python

regex

After some search this seems more difficult than I thought: I am trying to write a regular expression in Python to find a word which is not surrounded by other letters or dashes.

In the following examples, I am trying to match ios:

  1. It seems carpedios
  2. I like "ios" because they have blue products
  3. I like carpedios and ios
  4. I like carpedios and ios.
  5. i like carped-ios

The matches should be as follows:

  • 1: don't match because ios is after d.
  • 2: match because ios is not surrounded by letters.
  • 3: match because one of ios is not surrounded by letters.
  • 4: match because one of ios is not surrounded by letters.
  • 5: don't match because ios is followed by -.

How to do it with regex?

like image 774
bux Avatar asked Feb 19 '16 14:02

bux


People also ask

What does the metacharacter in a RegEx do?

A metacharacter is a character that has a special meaning during pattern processing. You use metacharacters in regular expressions to define the search criteria and any text manipulations.


2 Answers

The following one should suit your needs:

(?<!-)\bios\b(?!-)

Regular expression visualization

Debuggex Demo

like image 135
sp00m Avatar answered Oct 14 '22 23:10

sp00m


You can use \b to match the empty string at the start or end of a word. However, to also disallow - we have to use a character class containing both, then invert it. That would look something like this:

[^\b-]

Let's pick that apart. [] is the character class itself. ^ at the start says to invert the match, so only characters not in the character class match. Note that - has to come last (or perhaps first) in a character class, otherwise it will be mistaken for a range. (This allows you to say [0-9a-fA-F] as a shorthand for all hexadecimals.)

Let's try it! Here's your test file:

$ cat t.txt
It seems carpedios
I like "ios" because they have blue products
I like carpedios and ios
I like carpedios and ios.
i like carped-ios

Let's put together our pattern using the character classes above:

$ grep '[^\b-]ios[^\b-]' t.txt
I like "ios" because they have blue products
I like carpedios and ios
I like carpedios and ios.

Hope this helps!

Update: I notice there's a good alternative answer, but I hope this adds some extra explanation.

like image 20
Stig Brautaset Avatar answered Oct 14 '22 23:10

Stig Brautaset