In ruby 1.9.3, I'm trying to write a program that will find all words with n number of characters taken from an arbitrary set of characters. So for instance, if I'm given the characters [ b, a, h, s, v, i, e, y, k, s, a ] and n = 5, I need to find all 5-letter words that can be made using only those characters. Using the 2of4brif.txt word list from http://wordlist.sourceforge.net/ (to include British words and spellings, too), I have attempted the following code:
a = %w[b a h s v i e y k s a]
a.permutation(5).map(&:join).each do |x|
File.open('2of4brif.txt').each_line do |line|
puts line if line.match(/^[#{x}]+$/)
end
end
This does nothing (no error message, no output, as if frozen). I have also attempted variations based on the following threads:
What's the best way to search for a string in a file?
Ruby find string in file and print result
How to search for exact matching string in a text file using Ruby?
Finding lines in a text file matching a regular expression
Match a content with regexp in a file?
How to open a file and search for a word?
Every variation I have tried has resulted in either:
1) Freezing;
2) Printing all words from the list that contain the 5-character permutations (I assume that's what it's doing; I didn't go through and check all of the thousands of printed words); or
3) Printing all 5-character permutations found within words in the list (again, I assume that's what it's doing).
Again, I'm not looking for words that contain the 5-character permutations, I'm looking for 5-character permutations that are complete words in and of themselves, so a line in the text file should only be printed if it is a perfect match with a permutation.
What am I doing wrong? Thanks in advance!
You’re not really using regular expressions here. Your program is very inefficient, not only because you’re re-opening the file for each single permutation as has been pointed out (and there are 55k of them!); but above all because all you want to do is
/^[bahsvieyksa]{5}$/
for each line of the file.
I would thus suggest:
File.open('2of4brif.txt').each_line do |line|
puts line if line.match(/^[bahsvieyksa]{5}$/)
end
as a much more efficient alternative
This works for me using the english.0 file on that page (sorry, I couldn't find the specific file you mentioned):
a = %w[b a h s v i e y k s a l d n]
dict = {}
a.permutation(5).each do |p|
dict[p.join('')] = true
end
File.open('english.0').each_line do |line|
line.chomp!.downcase!
puts line if dict[line]
end
The structure should be pretty clear - I build the dictionary of permutations up front in one giant hash (you may need to revisit this depending on input sizes, but memory is cheap these days), and then I used the fact that the input was "one word per line" to simply key into that hash.
Also note, in my version, I read through the file only once. In yours you scan the file once per permutation, and there are thousands of permutations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With