Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby transliteration using hash [closed]

Tags:

ruby

I try to make Cyrillic => Latin transliteration using hash, I use # encoding: utf-8 and ruby 1.9.3. I want this code to change the value of file_name. Why does this code leave file_name unchanged?

abc = Hash.new
abc = {"a" => "a", "b" => "б", "v" => "в", 'g' => "г", 'd'=> "д", 'jo' => "ё", 'zh' => "ж", 'th' => "з", 'i' => "и", 'l' => "л", 'm' => "м", 'n' => "н",'p' => "п", 'r' => "р", 's' => "с", 't' => "т", 'u' => "у", 'f' => "ф", 'h' => "х", 'c' => "ц", 'ch' => "ч", 'sh' => "ш", 'sch' => "щ", 'y' => "ы",'u' => "ю", 'ja' => "я"} 
file_name.each_char do |c| 
     abc.each {|key, value| if c == value then c = key end }
end 
like image 420
Rudziankoŭ Avatar asked Apr 10 '26 19:04

Rudziankoŭ


1 Answers

The problem with .each_char is that the block variable - c in your question - does not point back to the character in the string allowing to alter the string in situ. There are ways you could make that per-character mapping work from there (using a .map followed by a .join for instance) - but they are inefficient compared to .tr! or .gsub! for your purpose, because breaking the string out into an array of characters and reconstructing it involves creating many Ruby objects.

I think you need to do something like

file_name.tr!( 'aбвгдилмнпрстуфхцыю', 'abvgdilmnprstufhcyu' )

which covers the single letter conversions very efficiently. You then have some multi-letter conversions. I would use gsub! for that, and an inverted copy of your hash

latin_of = {"ё"=>"jo", "ж"=>"zh", "з"=>"th", "ч"=>"ch", 
            "ш"=>"sh", "щ"=>"sch", "я"=>"ja"}
file_name.gsub!( /[ёжзчшщя]/ ) { |cyrillic| latin_of[ cyrillic ] }

Note, unlike each_char, the return value of the block in .gsub! is used to replace whatever you matched in the original string. The above code uses an inversion of your original hash to quickly find the correct Latin replacement for the matched Cyrillic character.

You don't need tr! . . . instead, if you prefer, just use an inversion of your original hash in one pass using this second syntax. The cost of using two methods probably means you don't really gain that much from using .tr!. But you should know about String#tr! method, it can be very handy.


Edit: As suggested in comments, .gsub! can do a lot more for you here. Assuming latin_of was the complete hash with Cyrillic keys and the Latin values, you could do this:

file_name.gsub!( Regexp.union(latin_of.keys), latin_of )

Two things to note:

  • Regexp.union(latin_of.keys) is taking an array of the keys you want to convert and ensuring gsub will find them ready for replacement in the String

  • gsub! accepts a hash as the second parameter, and converts each match by looking it up as a key and replacing it with the associated value - exactly the behaviour you are looking for.

like image 156
Neil Slater Avatar answered Apr 13 '26 13:04

Neil Slater



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!