I've searched high and low for a simple solution. None have been simple or 'just worked'.
To start, I keep getting this error:
ArgumentError: invalid byte sequence in US-ASCII
This happens because users are copying and pasting content from Microsoft Word. I just want a way to sanitize user input so that it is stored in the database in the proper format, regardless of what format they send me. Even if it completely destroys their input, I'm not concerned with that. I just want to force encode their input into something that won't complain later.
I've tried:
ic = Iconv.new('US-ASCII//IGNORE', 'US-ASCII')
safe_string = ic.iconv(unsafe_string)
After doing the above and resaving the new string, the error still persists. I've tried
safe_string = unsafe_string.force_encoding('US-ASCII')
Still gives me errors.
I've also tried the above with UTF-8. Same thing.
Isn't there something simple I can do to convert their string properly before its saved in the database? Thanks.
I think I found the solution myself. So, if you want to force encode the string to your current encoding you can do something like:
safe_string = unsafe_string.encode('US-ASCII', :undef => :replace)
But really, I would recommend using UTF-8. I am not sure why my default encoding was set to US-ASCII, I assumed rails set the default to UTF-8. Anyways, doing the following fixed the problem as well:
Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'UTF-8'
This was put in an initializer. If anyone has any better suggestions please let me know. But I believe UTF-8 is the most popular encoding and I read on several sites it was the recommended encoding.
Thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With