Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

smarter character replacement using ruby gsub and regexp

I'm trying to create permalink like behavior for some article titles and i don't want to add a new db field for permalink. So i decided to write a helper that will convert my article title from:

"O "focoasă" a pornit cruciada, împotriva bărbaţilor zgârciţi" to "o-focoasa-a-pornit-cruciada-impotriva-barbatilor-zgarciti".

While i figured out how to replace spaces with hyphens and remove other special characters (other than -) using:

title.gsub(/\s/, "-").gsub(/[^\w-]/, '').downcase

I am wondering if there is any other way to replace a character with a specific other character from only one .gsub method call, so I won't have to chain title.gsub("ă", "a") methods for all the UTF-8 special characters of my localization.

I was thinking of building a hash with all the special characters and their counterparts but I haven't figured out yet how to use variables with regexps.

What I was looking for is something like:

title.gsub(/\s/, "-").gsub(*replace character goes here*).gsub(/[^\w-]/, '').downcase

Thanks!

like image 507
alex.g Avatar asked Jan 23 '23 05:01

alex.g


2 Answers

I solved this in my application by using the Unidecoder gem:

require 'unidecode'

def uninternationalize(str)
  Unidecoder.decode(str).gsub("[?]", "").gsub(/`/, "'").strip
end
like image 109
Daniel Vandersluis Avatar answered Jan 26 '23 00:01

Daniel Vandersluis


If you want to only transliterate from one character to another, you can use the String#tr method which does exactly the same thing as the Unix tr command: replace every character in the first list with the character in the same position in the second list:

'Ünicöde'.tr('ÄäÖöÜüß', 'AaOoUus') # => "Unicode"

However, I agree with @Daniel Vandersluis: it would probably be a good idea to use some more specialized library. Stuff like this can get really tedious, really fast. Also, a lot of those characters actually have standardized transliterations (ä → ae, ö → oe, ..., ß → ss), and users may be expecting to have the transliterations be correct (I certainly don't like being called Jorg – if you really must, you may call me Joerg but I very much prefer Jörg) and if you have a library that provides you with those transliterations, why not use them? Note that there are a lot of transliterations which are not single characters and thus can't be used with String#tr anyway.

like image 39
Jörg W Mittag Avatar answered Jan 26 '23 00:01

Jörg W Mittag