Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex, remove escape characters and punctuation except for apostrophe

I have a string that looks like this:

"aaa\n\t\n asd123asd water's tap413 water blooe's"

How can I remove all escape characters, numbers, and punctuation except apostrophe using regex?

I'm pretty new to regex, and would appreciate it if you can explain what each expression means, if the regex formula is to be complicated

like image 820
Eric Kim Avatar asked Jan 30 '23 03:01

Eric Kim


1 Answers

You're looking for a search and replace method, which in Python should be re#sub().

Simply replace non-letters & apostrophe ([^a-zA-Z' ]+) with '' (nothing).

- Oh well, what about the escaped characters?
R: They will turn into a single character when inside the string, \n will be turned into a newline character for example, which is not a letter or a '.

Instead, if you actually have escaped an escaped character in your string (like: "abc\\nefg"), you should add a \\\\.| at the start of your regex, which will match the backslash + any other character (so it will be: \\\\.|[^a-zA-Z' ])

Here is the working exemple:

import re
s = "aaa\n\t\n asd123asd water's tap413 water blooe's"
replaced = re.sub("[^a-zA-Z' ]+", '', s)
print(replaced)

https://repl.it/repls/ReasonableUtterAnglerfish


Would appreciate it if you can explain what each expression means

So, the explanation:

  • \\\\ - Matches a backslash (Why four? Each pair will escape the slash for the Python string's compilation, which will turn into a \\ which is how you match a backslash in regex).
  • . - Match any character except for the newline character.
  • | - OR expression, matches what is before OR what is after.
  • [^...] - Must NOT be one of these characters (inside).
  • a-zA-Z'  - Match characters from a to z, A to Z, ' or  .
  • + - Quantifier, not needed here, but would be good to reduce the matches, hence reduce the time of execution (Which would translate as "One or more occurrences of the term behind").
like image 147
Mateus Avatar answered Jan 31 '23 17:01

Mateus