Regex: ignore extra characters

Question

I'm trying to figure out how to detect extra characters within a spam word like:

pha.rmacy or vi*agra

any ideas?

João Silva · Accepted Answer

You could use a (dis)similarity metric, such as edit distance. For instance, the edit distance between vi.agra and viagra is 1.

Then, you determine that a given word is the same as the spam word, if the edit distance between them is below a certain threshold like, say, 2.

But if you really want to use a regex, you can use something like /[^a-zA-Z0-9-\s]/ to remove punctuation from the word. But then again, you would fail to identify something like viZagra as being the same word as viagra.

Mark Wilkins · Answer

Regular expressions do not seem like the appropriate tool for figuring this out. But as an attempt to answer it just because it is interesting, a simple way would be to do something like this:

/v.?i.?a.?g.?r.?a/

It would match 0 or 1 characters between each letter.

Regex: ignore extra characters

Tags:

regex

Fuxi

2 Answers

João Silva

Mark Wilkins

Recent Activity

Donate For Us

Regex: ignore extra characters

Tags:

regex

Fuxi

2 Answers

João Silva

Mark Wilkins

Related questions

Recent Activity

Donate For Us