I have form on my page where user can type some text and submit it. Text is then sent to server (REST API on top of node.js) and saved to DB (postgres).
The problem is that some strange characters (control characters) are saved to DB occasionaly - for example escape control character (^[) or backspace control character (^H). Generally it does not break anything since those characters are invisible, so html is rendered correctly. However when I provide xml content for RSS readers, they (readers) return "Malformed XML" because of those control characters (it works after deleting them).
My question is how I can remove those characters from a string on client level (javascript) or server level (javascript/node.js)?
Depending on your preferences, you'd obtain the Python one-liner ''. join(c for c in s if unicodedata. category(c)[0] != 'C') removes all control characters in the original string s .
We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.
Take the string, use the string replace function to replace any illegal character (or character range) with '', and then save that instead.
Control characters in Unicode are at codepoints U+0000 through U+001F and U+007F through U+009F. Use a RegExp to find those control characters and replace them with an empty string:
str.replace(/[\u0000-\u001F\u007F-\u009F]/g, "")
If you want to remove additional characters, add the characters to the character class inside the RegExp. For example, to remove U+200B ZERO WIDTH SPACE as well, add \u200B
before the ]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With