I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file. The source file seems to have a few non-ASCII characters that are fouling up the processing routine.
I already have a static method that I run each input field through but it performs basic checks like removing commas and quotes. Does anybody know how I could add functionality that removes non-ASCII characters too?
Remove Non-ASCII Characters From Text Python Here we can use the replace() method for removing the non-ASCII characters from the string. In Python the str. replace() is an inbuilt function and this method will help the user to replace old characters with a new or empty string.
In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().
In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.
Here a simple solution:
public static bool IsASCII(this string value) { // ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there return Encoding.UTF8.GetByteCount(value) == value.Length; }
source: http://snipplr.com/view/35806/
string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With