Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Standardize a String for Filename, remove accents and special chars

I'm trying to find a way to normalize a string to pass it as a filename.

I have this so far:

my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, '').downcase.gsub(/[^a-z]/, '_')

But first problem: the - character. I guess there is more problems with this method.

I don't control the name, the name string can have accents, white spaces and special chars. I want to remove all of them, replace the accents with the corresponding letter ('é' => 'e') and replace the rest with the '_' character.

The names are like:

  • "Prélèvements - Routine"
  • "Carnet de santé"
  • ...

I want them to be like a filename with no space/special chars:

  • "prelevements_routine"
  • "carnet_de_sante"
  • ...

Thanks for the help :)

like image 268
MrYoshiji Avatar asked Nov 21 '12 22:11

MrYoshiji


2 Answers

Take a look at ActiveSupport::Inflector.transliterate, it's very useful handling this kind of chars problems. Read there: ActiveSupport::Inflector

Then, you could do something like:

ActiveSupport::Inflector.transliterate my_string.downcase.gsub(/\s/,"_")
like image 186
Dario Barrionuevo Avatar answered Oct 17 '22 07:10

Dario Barrionuevo


Use ActiveStorage::Filename#sanitized, if spaces are okay.

If spaces are okay, which I would suggest keeping, if this is a User-provided and/or User-downloadable file, then you can make use of the ActiveStorage::Filename#sanitized method that is meant for exactly this situation.

It removes special characters that are not allowed in a file name, whilst keeping all of the nice characters that Users typically use to nicely organize and describe their files, like spaces and ampersands (&).

ActiveStorage::Filename.new( "Prélèvements - Routine" ).sanitized 
#=> "Prélèvements - Routine"

ActiveStorage::Filename.new( "Carnet de santé" ).sanitized 
#=> "Carnet de santé"

ActiveStorage::Filename.new( "Foo:Bar / Baz.jpg" ).sanitized 
#=> "Foo-Bar - Baz.jpg"

Use String#parameterize, if you want to remove nearly everything.

And if you're really looking to remove everything, try String#parameterize:

"Prélèvements - Routine".parameterize
#=> "prelevements-routine"

"Carnet de santé".parameterize
#=> "carnet-de-sante"

"Foo:Bar / Baz.jpg".parameterize
#=> "foo-bar-baz-jpg"
like image 40
Joshua Pinter Avatar answered Oct 17 '22 08:10

Joshua Pinter