In ruby, here is how you can check for a substring in a string:
str = "hello world"
str.include?("lo")
=> true
When I am attempting to save an emoji in a text column in a rails application (the text column within a mysql database is utf8
), it comes back with this error:
Incorrect string value: \xF0\x9F\x99\x82
For my situation in a rails application, it suffices to see if an emoji is present in the submitted text. If an emoji is present: raise a validation error. Example:
class MyModel < ApplicationRecord
validate :cannot_contain_emojis
private
def cannot_contain_emojis
if my_column.include?("/\xF0")
errors.add(:my_column, 'Cannot include emojis")
end
end
end
Note: The reason I am checking for \xF0
is because according to this site, it appears that all, or most, emoji's begin with this signature.
This however does not work. It continues to return false
even when it is true. I'm pretty sure the issue is that my include statement doesn't work because the emoji is not converted to bytes for the comparison.
Question How can I make a validation to check that an emoji is not passed in?
\xF0\x9F\x99\x82
You can use the Emoji
Unicode property to test for Emoji using a Regexp
, something like this:
def cannot_contain_emojis
if /\p{Emoji}/ =~ my_column
errors.add(:my_column, 'Cannot include emojis')
end
end
Unicode® Technical Standard #51 "UNICODE EMOJI" contains a more sophisticated regex:
\p{RI} \p{RI}
| \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
(\x{200D} \p{Emoji}
( \p{EMod}
| \x{FE0F} \x{20E3}?
| [\x{E0020}-\x{E007E}]+ \x{E007F} )?
)*
[Note: some of those properties are not implemented in Onigmo / Ruby.]
However, checking for Emojis probably not going to be enough. It is pretty clear that your text processing is somehow broken at some point. And if it is broken by an Emoji, then there is a chance it will also be broken by my name, or the name of Ruby's creator 松本 行弘, or by the completely normal English word “naïve”.
Instead of playing a game of whack-a-mole trying to detect every Emoji, mathematical symbol, Arabic letter, typographically correct punctuation mark, etc., it would be much better simply the fix the text processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With