I'm practicing with Ruby and regex to delete certain unwanted characters. For example:
input = input.gsub(/<\/?[^>]*>/, '')
and for special characters, example ☻ or :
input = input.gsub('&#', '')
This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:
™
My question: How I can delete special characters if the user enters a special character without code, like this:
™ ☻
Alternatively, you can press Ctrl+H. Click in the “Find What” box and then delete any existing text or characters. Click the “More>>” button to open up the additional options, click the “Special” button, and then click the “Paragraph Mark” option from the dropdown list.
To do so, in Adobe Acrobat, go to Tools, followed by Content Editing to select the Edit Text and Images option. From there, highlight and select the individual crop marks you want to remove. Once they are selected in their own highlighted box, hit delete to remove them, and save the revised PDF.
First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:
input = input.gsub(/[^0-9A-Za-z]/, '')
If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.
Finally, you might want to normalize your input by converting to or from HTML escape sequences.
If you just wanted ASCII characters, then you can use:
original = "aøbauhrhræoeuacå" cleaned = "" original.each_byte { |x| cleaned << x unless x > 127 } cleaned # => "abauhrhroeuac"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With