Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all characters except alphabets and numbers from a Ruby string

Tags:

string

ruby

I have a string input field in a form. I get that value in params hash. How should I remove all characters except alphabets and numbers from that string.

like image 993
Anand Avatar asked Jan 10 '11 19:01

Anand


People also ask

How do I remove special characters from a string in Ruby?

In Ruby, we can permanently delete characters from a string by using the string. delete method. It returns a new string with the specified characters removed.

How do you remove the last two characters of a string in Ruby?

The chop method is used to remove the last character of a string in Ruby. If the string ends with \r\n , it will remove both the separators. If an empty string calls this method, then an empty string is returned. We can call the chop method on a string twice.

How do you know if a character is alphanumeric in Ruby?

You can read more in Ruby's docs for regular expressions. lookAhead =~ /[[:alnum:]]/ if you just want to check whether the char is alphanumeric without needing to know which.


1 Answers

Just to remind people of good 'ol tr:

asdf.tr('^A-Za-z0-9', '') 

which is finding the complement of the character ranges and translating the characters to ''.

I was curious whether using a \W character class was faster than ranges and gsub vs. tr:

require 'benchmark'  asdf = [('A'..'z').to_a, ('0'..'9').to_a].join  puts asdf puts asdf.tr(   '^A-Za-z0-9',    '' ) puts asdf.gsub( /[\W_]+/,        '' ) puts asdf.gsub( /\W+/,           '' ) puts asdf.gsub( /\W/,            '' ) puts asdf.gsub( /[^A-Za-z0-9]+/, '' ) puts asdf.scan(/[a-z\d]/i).join  n = 100_000 Benchmark.bm(7) do |x|   x.report("tr:")    { n.times do; asdf.tr('^A-Za-z0-9', '');      end }   x.report("gsub1:") { n.times do; asdf.gsub(/[\W_]+/, '');        end }   x.report("gsub2:") { n.times do; asdf.gsub(/\W+/, '');           end }   x.report("gsub3:") { n.times do; asdf.gsub(/\W/, '');            end }   x.report("gsub4:") { n.times do; asdf.gsub(/[^A-Za-z0-9]+/, ''); end }   x.report("scan:")  { n.times do; asdf.scan(/[a-z\d]/i).join;     end } end  >> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >> ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 >>              user     system      total        real >> tr:      0.560000   0.000000   0.560000 (  0.557883) >> gsub1:   0.510000   0.000000   0.510000 (  0.513244) >> gsub2:   0.820000   0.000000   0.820000 (  0.823816) >> gsub3:   0.960000   0.000000   0.960000 (  0.955848) >> gsub4:   0.900000   0.000000   0.900000 (  0.902166) >> scan:    5.630000   0.010000   5.640000 (  5.630990) 

You can see a couple of the patterns aren't catching the '_', which is part of \w, and, as a result not meeting the OP's request.

like image 139
the Tin Man Avatar answered Sep 20 '22 17:09

the Tin Man