Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String encoding via ruby: capturing user input safely

I've searched high and low for a simple solution. None have been simple or 'just worked'.

To start, I keep getting this error:

ArgumentError: invalid byte sequence in US-ASCII

This happens because users are copying and pasting content from Microsoft Word. I just want a way to sanitize user input so that it is stored in the database in the proper format, regardless of what format they send me. Even if it completely destroys their input, I'm not concerned with that. I just want to force encode their input into something that won't complain later.

I've tried:

ic = Iconv.new('US-ASCII//IGNORE', 'US-ASCII')
safe_string = ic.iconv(unsafe_string)

After doing the above and resaving the new string, the error still persists. I've tried

safe_string = unsafe_string.force_encoding('US-ASCII')

Still gives me errors.

I've also tried the above with UTF-8. Same thing.

Isn't there something simple I can do to convert their string properly before its saved in the database? Thanks.

like image 218
Binary Logic Avatar asked Nov 15 '22 02:11

Binary Logic


1 Answers

I think I found the solution myself. So, if you want to force encode the string to your current encoding you can do something like:

safe_string = unsafe_string.encode('US-ASCII', :undef => :replace)

But really, I would recommend using UTF-8. I am not sure why my default encoding was set to US-ASCII, I assumed rails set the default to UTF-8. Anyways, doing the following fixed the problem as well:

Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'UTF-8'

This was put in an initializer. If anyone has any better suggestions please let me know. But I believe UTF-8 is the most popular encoding and I read on several sites it was the recommended encoding.

Thanks.

like image 169
Binary Logic Avatar answered Dec 22 '22 09:12

Binary Logic