Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I delete special characters?

I'm practicing with Ruby and regex to delete certain unwanted characters. For example:

input = input.gsub(/<\/?[^>]*>/, '') 

and for special characters, example ☻ or ™:

input = input.gsub('&#', '') 

This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:

&#153; 

My question: How I can delete special characters if the user enters a special character without code, like this:

™ ☻ 
like image 352
Yud Avatar asked Apr 10 '09 12:04

Yud


People also ask

How do I get rid of special characters in Word?

Alternatively, you can press Ctrl+H. Click in the “Find What” box and then delete any existing text or characters. Click the “More>>” button to open up the additional options, click the “Special” button, and then click the “Paragraph Mark” option from the dropdown list.

How do I remove special characters from a PDF?

To do so, in Adobe Acrobat, go to Tools, followed by Content Editing to select the Edit Text and Images option. From there, highlight and select the individual crop marks you want to remove. Once they are selected in their own highlighted box, hit delete to remove them, and save the revised PDF.


2 Answers

First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:

input = input.gsub(/[^0-9A-Za-z]/, '') 

If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.

Finally, you might want to normalize your input by converting to or from HTML escape sequences.

like image 194
Can Berk Güder Avatar answered Sep 20 '22 10:09

Can Berk Güder


If you just wanted ASCII characters, then you can use:

original = "aøbauhrhræoeuacå"  cleaned = "" original.each_byte { |x|  cleaned << x unless x > 127   } cleaned   # => "abauhrhroeuac" 
like image 38
Matthew Schinckel Avatar answered Sep 19 '22 10:09

Matthew Schinckel