What I want to achieve is to get the word count in a multi-language text.
Like if I have a text has both English and Chinese: The last Olympics was held in 北京, the count should be 8, because there's six English words and two Chinese characters, like the word count in Microsoft Word.
What's the best way to do that in Ruby and in JavaScript?
I have a solution based on "how can i detect cjk characters in a string in ruby".
s = 'The last Olympics was held in 北京'
class String
def contains_cjk?
!!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}|\p{Hangul}/)
end
end
s.split.inject(0) do |sum, word|
if word.contains_cjk?
sum += word.length # => ONLY work in Ruby 1.9.
# Search for other methods to do this for 1.8
else
sum += 1
end
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With