Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to find the number of unique elements in a string

Tags:

string

ruby

How can I find unique elements in a string in the best way?

Sample string format is

myString = "34345667543"

o/p

  ['3','4','3','5'.....]
like image 724
Mohit Jain Avatar asked Aug 08 '11 10:08

Mohit Jain


People also ask

How do you count the number of unique values in a list?

The number of unique values in a list is the number of elements excluding duplicates. For example, the number of unique values in [1, 1, 2, 2, 3] is 3.

How do you count unique characters in a string in Python?

Python3. If not present append the characters to empty string. Now the empty string consists of only unique characters, use len() method to display the length.


2 Answers

This is an interesting question, and since it returns so many almost similar results, I did a simple benchmark to decide which is actually the best solution:

require 'rubygems'
require 'benchmark'
require 'set'

puts "Do the test"

Benchmark.bm(40) do |x|

  STRING_TEST = "26263636362626218118181111232112233"

  x.report("do split and uniq") do
    (1..1000000).each { STRING_TEST.split(//).uniq }
  end

  x.report("do chars to_a uniq") do
    (1..1000000).each { STRING_TEST.chars.to_a.uniq }
  end

  x.report("using Set") do
    (1..1000000).each { Set.new(STRING_TEST.split('')).to_a }
  end

end

and the results of this test are, not entirely surprising (0n 1.8.7p352):

                                              user     system      total        real
do split and uniq                        27.060000   0.000000  27.060000 ( 27.084629)
do chars to_a uniq                       14.440000   0.000000  14.440000 ( 14.452377)
using Set                                41.740000   0.000000  41.740000 ( 41.760313)

and on 1.9.2p180 :

                                              user     system      total        real
do split and uniq                        19.260000   0.000000  19.260000 ( 19.242727)
do chars to_a uniq                        8.980000   0.010000   8.990000 (  8.983891)
using Set                                28.220000   0.000000  28.220000 ( 28.186787)

The results for REE (1.8.7) are close to 1.9.2 :

                                              user     system      total        real
do split and uniq                        19.120000   0.000000  19.120000 ( 19.126034)
do chars to_a uniq                       14.740000   0.010000  14.750000 ( 14.766540)
using Set                                32.770000   0.120000  32.890000 ( 32.921878)

For fun, I also tried on rubinius:

                                              user     system      total        real
do split and uniq                        26.100000   0.000000  26.100000 ( 26.651468)
do chars to_a uniq                       25.680000   0.000000  25.680000 ( 25.780944)
using Set                                22.500000   0.000000  22.500000 ( 22.649291)

So while the split('\\').uniq wins points for readability, the chars.to_a.uniq is almost double as fast.

It is weird to notice that on rubinius the Set solution is the fastest, but no where near as fast as the chars.to_a.uniq on 1.9.2.

like image 74
nathanvda Avatar answered Nov 15 '22 11:11

nathanvda


Use this short:

myString.split(//).uniq
like image 40
Emiliano Poggi Avatar answered Nov 15 '22 10:11

Emiliano Poggi