Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find and print lines in a file exactly matching string or regexp (Ruby)

In ruby 1.9.3, I'm trying to write a program that will find all words with n number of characters taken from an arbitrary set of characters. So for instance, if I'm given the characters [ b, a, h, s, v, i, e, y, k, s, a ] and n = 5, I need to find all 5-letter words that can be made using only those characters. Using the 2of4brif.txt word list from http://wordlist.sourceforge.net/ (to include British words and spellings, too), I have attempted the following code:

a = %w[b a h s v i e y k s a]
a.permutation(5).map(&:join).each do |x|
  File.open('2of4brif.txt').each_line do |line|
    puts line if line.match(/^[#{x}]+$/)
  end
end

This does nothing (no error message, no output, as if frozen). I have also attempted variations based on the following threads:

What's the best way to search for a string in a file?

Ruby find string in file and print result

How to search for exact matching string in a text file using Ruby?

Finding lines in a text file matching a regular expression

Match a content with regexp in a file?

How to open a file and search for a word?

Every variation I have tried has resulted in either:

1) Freezing;

2) Printing all words from the list that contain the 5-character permutations (I assume that's what it's doing; I didn't go through and check all of the thousands of printed words); or

3) Printing all 5-character permutations found within words in the list (again, I assume that's what it's doing).

Again, I'm not looking for words that contain the 5-character permutations, I'm looking for 5-character permutations that are complete words in and of themselves, so a line in the text file should only be printed if it is a perfect match with a permutation.

What am I doing wrong? Thanks in advance!

like image 930
grandinero Avatar asked Oct 21 '22 16:10

grandinero


2 Answers

You’re not really using regular expressions here. Your program is very inefficient, not only because you’re re-opening the file for each single permutation as has been pointed out (and there are 55k of them!); but above all because all you want to do is

/^[bahsvieyksa]{5}$/

for each line of the file.

I would thus suggest:

File.open('2of4brif.txt').each_line do |line|
  puts line if line.match(/^[bahsvieyksa]{5}$/)
end

as a much more efficient alternative

like image 166
Arthur Reutenauer Avatar answered Oct 24 '22 18:10

Arthur Reutenauer


This works for me using the english.0 file on that page (sorry, I couldn't find the specific file you mentioned):

a = %w[b a h s v i e y k s a l d n]
dict = {}
a.permutation(5).each do |p|
  dict[p.join('')] = true
end

File.open('english.0').each_line do |line|
  line.chomp!.downcase!
  puts line if dict[line]
end

The structure should be pretty clear - I build the dictionary of permutations up front in one giant hash (you may need to revisit this depending on input sizes, but memory is cheap these days), and then I used the fact that the input was "one word per line" to simply key into that hash.

Also note, in my version, I read through the file only once. In yours you scan the file once per permutation, and there are thousands of permutations.

like image 31
Dave S. Avatar answered Oct 24 '22 17:10

Dave S.