Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing vowels from string in Ruby

I am testing possible solutions for the following problem. The first two solutions I came up with 'disembowel' & 'disembowel_2' aren't running properly. I'm looking to figure out why.

Disembowel_3 is by far my favorite solution. But I feel like I have no right to use disembowel_3 if I don't understand where I went wrong with my first two solutions.

Could anyone help me figure out what's wrong with the first two solutions?

# Write a function disemvowel(string), which takes in a string,
# and returns that string with all the vowels removed. Treat "y" as a
# consonant.

def disemvowel(string)
  string_array = string.split
  vowels = %w[aeiou]
  i = 0
  while i < string.length
    if vowels.include? string[i] == true
      string_array[i] =  " "
    end
    i +=1
  end

  new_string = string_array.join
  new_string = new_string.sub(/\s+/,"")
  return new_string
end


def disemvowel_2(string)
  string_array = string.split('')
  string_array.delete('a','e','i','o','u')
  return string_array.join('')
end

# This is my favorite solution.
def disemvowel_3(string)
  result = string.gsub(/[aeiou]/i, '')
  return result
end


#tests
puts disemvowel("foobar") 
puts disemvowel("ruby") 
puts disemvowel("aeiou") 
like image 451
Dan Avatar asked Dec 10 '22 16:12

Dan


2 Answers

Minor changes would make disemvowel work correctly. This is what was fixed, and why:

Disemvowel

The Bugs

  1. split was changed to string.split(""). split with no arguments will split by spaces, and split("") will split by characters. With this change, the string_array becomes an array of each of the characters in the string. This can also be done more succinctly with string.chars, which is the preferred method.

See:

  • String#split
  • String#chars
  1. vowels was changed to a string. %w[] creates an array of the words, so when using %w[aeiou], vowels was actually an array of 1 string "aeiou". This meant that neither String#include? nor Array#include? would work in the comparison to each character. Changing it to a constant string meant that vowels.include? could match against a character.

See:

  • %w[]
  • Array#include?
  • String.include?
  1. vowels.include? had no parens and was explicitly comparing to true. The way that Ruby works, the result of the expression string_array[i] == true was passed to vowels.include?, which wasn't what was intended.

A couple of style tips that can help with this:

  • comparisons to true should be implicit (e.g. don't use == true)
  • use parens when calling functions or methods.

See:

  • Omit parens for DSL and keywords; use around all other method invocations section of the Ruby Style Guide
  1. sub changed to gsub. The call to sub will only make one replacement in a string, so when calling with "f b r", only the first spaces are replaced, leaving the string "fb r". gsub does "global substitution", which is exactly what you want in this case.

See:

  • String#gsub
  • String#sub

First working version

The working disemvowel function looks like this:

def disemvowel(string)
  string_array = string.split("")
  vowels = "aeiou"
  i = 0
  while i < string.length
    if vowels.include?(string[i])
      string_array[i] =  " "
    end
    i +=1
  end

  new_string = string_array.join
  new_string = new_string.gsub(/\s+/,"")
  return new_string
end

and produces this output with your tests:

fbr
rby

Cleaning up

  1. Support mixed-case vowels.

    def disemvowel_1_1(string) string_array = string.split("") vowels = "aeiouAEIOU" i = 0 while i < string_array.length if vowels.include?(string_array[i]) string_array[i] = " " end i +=1 end

    new_string = string_array.join new_string = new_string.gsub(/\s+/,"") return new_string end

  2. Consistent use of string_array instead of intermingling with string. Various uses of string occur when it's more appropriate to use string_array, instead. This should be replaced.

    def disemvowel_1_2(string) string_array = string.split("") vowels = "aeiouAEIOU" i = 0 while i < string_array.length if vowels.include?(string_array[i]) string_array[i] = " " end i +=1 end

    new_string = string_array.join new_string = new_string.gsub(/\s+/,"") return new_string end

  3. Don't use a variable for "aeiou". This is a constant expression, and should either be written as a string literal or a constant. In this case, a literal string will be chosen, as there's no enclosing scope to constrain the use of a constant in the global namespace (in case this code gets inserted into another context).

    def disemvowel_1_3(string) string_array = string.split("") i = 0 while i < string_array.length if "aeiouAEIOU".include?(string_array[i]) string_array[i] = " " end i +=1 end

    new_string = string_array.join new_string = new_string.gsub(/\s+/,"") return new_string end

  4. Replace the vowel character with nil instead of " " to eliminate the gsub replacement.

    def disemvowel_1_4(string) string_array = string.split("") i = 0 while i < string_array.length if "aeiouAEIOU".include?(string_array[i]) string_array[i] = nil end i +=1 end

    new_string = string_array.join return new_string end

  5. Convert the while loop to Array#each_with_index to process the array elements

    def disemvowel_1_5(string) string_array = string.split("") string_array.each_with_index do |char, i| if "aeiouAEIOU".include?(char) string_array[i] = nil end end

    new_string = string_array.join return new_string end

  6. Replace the use of split("") with String#chars to get the array of characters to process.

    def disemvowel_1_6(string) string_array = string.chars string_array.each_with_index do |char, i| if "aeiouAEIOU".include?(char) string_array[i] = nil end end

    new_string = string_array.join return new_string end

  7. Reduce the number of temporary variables by chaining results. This can minimize the number of individual variables that Ruby has to keep track of and reduce the variable lookup that occurs each time a variable name is referenced.

    def disemvowel_1_7(string) string_array = string.chars string_array.each_with_index do |char, i| if "aeiouAEIOU".include?(char) string_array[i] = nil end end

    new_string = string_array.join return new_string end

  8. Remove the explicit return to use Ruby's expression-based return values.

    def disemvowel_1_8(string) string_array = string.chars string_array.each_with_index do |char, i| if "aeiouAEIOU".include?(char) string_array[i] = nil end end.join end

  9. Use Array#map to process characters, rather than Array#each_with_index.

    def disemvowel_1_9(string) string.chars.map {|char| "aeiouAEIOU".include?(char) ? nil : char }.join end

Disemvowel 2

The Bugs

  1. Replace delete with delete_if. The Array#delete method will only delete exact matches, so you would have to loop over the vowels to make it work correctly in this case. However, Array#delete_if gives you the ability to delete on a condition, and that condition is vowels.include?(element).

See:

  • Array#delete
  • Array#delete_if

First working version

def disemvowel_2(string)
  string_array = string.split('')
  string_array.delete_if {|element| "aeiou".include?(element) }
  string_array.join('')
end 

Cleaning up

  1. Support mixed-case vowels.

    def disemvowel_2_1(string) string_array = string.split('') string_array.delete_if {|element| "aeiouAEIOU".include?(element) } string_array.join('') end

  2. Replace the use of split("") with String#chars to get the array of characters to process.

    def disemvowel_2_2(string) string_array = string.chars string_array.delete_if {|element| "aeiouAEIOU".include?(element) } string_array.join('') end

  3. Change join('') to just join. The join method will already join this way, so the extra param is redundant

    def disemvowel_2_3(string) string_array = string.chars string_array.delete_if {|element| "aeiouAEIOU".include?(element) } string_array.join end

  4. Reduce the number of temporary variables by chaining results. This can minimize the number of individual variables that Ruby has to keep track of and reduce the variable lookup that occurs each time a variable name is referenced.

    def disemvowel_2_4(string) string.chars.delete_if {|element| "aeiouAEIOU".include?(element) }.join end

Disemvowel 4

String has a delete method that will remove all matching characters. Given the vowels, this is a straightforward implementation:

def disemvowel_4(string)
  string.delete("aeiouAEIOU")
end

See:

  • String#delete

Testing

I created a unit-test like program to do programmatic self-testing, rather than just displaying the disemvoweled strings to the console. This will test each version of the function and report whether it passes or fails the test:

data = [
  ["foobar", "fbr"],
  ["ruby", "rby"],
  ["aeiou", ""],
  ["AeIoU", ""],
]

data.each do |test|
  puts "disemvowel_1   #{disemvowel_1(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_1 #{disemvowel_1_1(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_2 #{disemvowel_1_2(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_3 #{disemvowel_1_3(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_4 #{disemvowel_1_4(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_5 #{disemvowel_1_5(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_6 #{disemvowel_1_6(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_7 #{disemvowel_1_7(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_8 #{disemvowel_1_8(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_1_9 #{disemvowel_1_9(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_2   #{disemvowel_2(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_2_1 #{disemvowel_2_1(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_2_2 #{disemvowel_2_2(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_2_3 #{disemvowel_2_3(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_2_4 #{disemvowel_2_4(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_3   #{disemvowel_3(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
  puts "disemvowel_4   #{disemvowel_4(test[0]) == test[1] ? 'Pass' : 'Fail'}: '#{test[0]}'"
end

This will produce the following output:

>$ ruby disemvowel.rb
disemvowel_1   Pass: 'foobar'
disemvowel_1_1 Pass: 'foobar'
disemvowel_1_2 Pass: 'foobar'
disemvowel_1_3 Pass: 'foobar'
disemvowel_1_4 Pass: 'foobar'
disemvowel_1_5 Pass: 'foobar'
disemvowel_1_6 Pass: 'foobar'
disemvowel_1_7 Pass: 'foobar'
disemvowel_1_8 Pass: 'foobar'
disemvowel_1_9 Pass: 'foobar'
disemvowel_2   Pass: 'foobar'
disemvowel_2_1 Pass: 'foobar'
disemvowel_2_2 Pass: 'foobar'
disemvowel_2_3 Pass: 'foobar'
disemvowel_2_4 Pass: 'foobar'
disemvowel_3   Pass: 'foobar'
disemvowel_4   Pass: 'foobar'
disemvowel_1   Pass: 'ruby'
disemvowel_1_1 Pass: 'ruby'
disemvowel_1_2 Pass: 'ruby'
disemvowel_1_3 Pass: 'ruby'
disemvowel_1_4 Pass: 'ruby'
disemvowel_1_5 Pass: 'ruby'
disemvowel_1_6 Pass: 'ruby'
disemvowel_1_7 Pass: 'ruby'
disemvowel_1_8 Pass: 'ruby'
disemvowel_1_9 Pass: 'ruby'
disemvowel_2   Pass: 'ruby'
disemvowel_2_1 Pass: 'ruby'
disemvowel_2_2 Pass: 'ruby'
disemvowel_2_3 Pass: 'ruby'
disemvowel_2_4 Pass: 'ruby'
disemvowel_3   Pass: 'ruby'
disemvowel_4   Pass: 'ruby'
disemvowel_1   Pass: 'aeiou'
disemvowel_1_1 Pass: 'aeiou'
disemvowel_1_2 Pass: 'aeiou'
disemvowel_1_3 Pass: 'aeiou'
disemvowel_1_4 Pass: 'aeiou'
disemvowel_1_5 Pass: 'aeiou'
disemvowel_1_6 Pass: 'aeiou'
disemvowel_1_7 Pass: 'aeiou'
disemvowel_1_8 Pass: 'aeiou'
disemvowel_1_9 Pass: 'aeiou'
disemvowel_2   Pass: 'aeiou'
disemvowel_2_1 Pass: 'aeiou'
disemvowel_2_2 Pass: 'aeiou'
disemvowel_2_3 Pass: 'aeiou'
disemvowel_2_4 Pass: 'aeiou'
disemvowel_3   Pass: 'aeiou'
disemvowel_4   Pass: 'aeiou'
disemvowel_1   Fail: 'AeIoU'
disemvowel_1_1 Pass: 'AeIoU'
disemvowel_1_2 Pass: 'AeIoU'
disemvowel_1_3 Pass: 'AeIoU'
disemvowel_1_4 Pass: 'AeIoU'
disemvowel_1_5 Pass: 'AeIoU'
disemvowel_1_6 Pass: 'AeIoU'
disemvowel_1_7 Pass: 'AeIoU'
disemvowel_1_8 Pass: 'AeIoU'
disemvowel_1_9 Pass: 'AeIoU'
disemvowel_2   Pass: 'AeIoU'
disemvowel_2_1 Pass: 'AeIoU'
disemvowel_2_2 Pass: 'AeIoU'
disemvowel_2_3 Pass: 'AeIoU'
disemvowel_2_4 Pass: 'AeIoU'
disemvowel_3   Pass: 'AeIoU'
disemvowel_4   Pass: 'AeIoU'

Benchmarking

I wrote a benchmark program to test the performance of each implementation. Here's the benchmark program:

Times = 5_000
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!@#$%^&*(),./<>?;':\"[]{}\\|-=_+`~".chars
array = Times.times.map { |n| "#{chars.sample(n)}" }

puts "============================================================="
puts RUBY_DESCRIPTION

Benchmark.bm(15) do |x|
  dismevowel_1_report =   x.report("disemvowel_1:")   { array.each {|s| disemvowel_1(s) } }
  dismevowel_1_1_report = x.report("disemvowel_1_1:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_2_report = x.report("disemvowel_1_2:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_3_report = x.report("disemvowel_1_3:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_4_report = x.report("disemvowel_1_4:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_5_report = x.report("disemvowel_1_5:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_6_report = x.report("disemvowel_1_6:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_7_report = x.report("disemvowel_1_7:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_8_report = x.report("disemvowel_1_8:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_1_9_report = x.report("disemvowel_1_9:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_2_report   = x.report("disemvowel_2:")   { array.each {|s| disemvowel_2(s) } }
  dismevowel_2_1_report = x.report("disemvowel_2_1:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_2_2_report = x.report("disemvowel_2_2:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_2_3_report = x.report("disemvowel_2_3:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_2_4_report = x.report("disemvowel_2_4:") { array.each {|s| disemvowel_1_1(s) } }
  dismevowel_3_report   = x.report("disemvowel_3:")   { array.each {|s| disemvowel_3(s) } }
  dismevowel_4_report   = x.report("disemvowel_4:")   { array.each {|s| disemvowel_4(s) } }
end

And this is the output from the benchmarks:

=============================================================
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
                      user     system      total        real
disemvowel_1:     2.630000   0.010000   2.640000 (  3.487851)
disemvowel_1_1:   2.300000   0.010000   2.310000 (  2.536056)
disemvowel_1_2:   2.360000   0.010000   2.370000 (  2.651750)
disemvowel_1_3:   2.290000   0.010000   2.300000 (  2.449730)
disemvowel_1_4:   2.320000   0.020000   2.340000 (  2.599105)
disemvowel_1_5:   2.360000   0.010000   2.370000 (  2.473005)
disemvowel_1_6:   2.340000   0.010000   2.350000 (  2.813744)
disemvowel_1_7:   2.380000   0.030000   2.410000 (  3.663057)
disemvowel_1_8:   2.330000   0.010000   2.340000 (  2.525702)
disemvowel_1_9:   2.290000   0.010000   2.300000 (  2.494189)
disemvowel_2:     2.490000   0.000000   2.490000 (  2.591459)
disemvowel_2_1:   2.310000   0.010000   2.320000 (  2.503748)
disemvowel_2_2:   2.340000   0.010000   2.350000 (  2.608350)
disemvowel_2_3:   2.320000   0.010000   2.330000 (  2.820086)
disemvowel_2_4:   2.330000   0.010000   2.340000 (  2.735653)
disemvowel_3:     0.070000   0.000000   0.070000 (  0.070498)
disemvowel_4:     0.020000   0.000000   0.020000 (  0.018580)

Conclusion

The String#delete method massively outperforms all of the hand-rolled solutions except String#gsub by more than 100X, and it's 2.5 times faster than String#gsub. It's very easy to use and outperforms everything else; this is easily the best solution.

like image 179
Michael Gaskill Avatar answered Dec 29 '22 12:12

Michael Gaskill


This first solution is bureaucratic and has some errors, in code and style.

  • You break your string into an array of separate char, but do it wrongly with string_array = string.split. Or string_array = string.split('') or string_array = string.chars (best option) or string_array = string.each_char.to_a. If you do "asdfg".split, the result will be ["asdfg"], not ['a','s','d','f','g'], as you seem to expect.
  • Then you don't use this (supposed) array, but keeps using the original string. If you intended to to this, why would you try to split the original string?
  • Finally you move back to working with the array, changing it according to what happened in the original string. As you may see, you keep working with too many objects, more than needed certainly. This violates the KISS principle and not running properly is a consequence.

Your second solution, although much simpler than the first one, has the problem engineersmnky pointed. Array#delete does NOT take five arguments.

Finally, your third solution, although working fine, could be written in a much simpler way:

def disemvowel_3(string)
    string.gsub(/[aeiou]/i, '')
end

As I keep telling people here, you don't need an explict return in the end of a Ruby method. By default it will return the last value calculated, whatever it is.

Another possible solution, if you allow me to suggest, would be using Array#reject in the following way:

def disemvowel(str)
  vowels = %w[a e i o u]
  str.each_char.to_a.reject{ |item| vowels.include?(item) }.join
end
like image 20
Ed de Almeida Avatar answered Dec 29 '22 13:12

Ed de Almeida