I am newbie to programming, and have been studying python in my spare time for the past few months. I decided I was going to try and create a little script that converts American spellings to English spellings in a text file.
I have been trying all sorts of things for the past 5 hours, but eventually came up with something that got me somewhat closer to my goal, but not quite there!
#imported dictionary contains 1800 english:american spelling key:value pairs.
from english_american_dictionary import dict
def replace_all(text, dict):
for english, american in dict.iteritems():
text = text.replace(american, english)
return text
my_text = open('test_file.txt', 'r')
for line in my_text:
new_line = replace_all(line, dict)
output = open('output_test_file.txt', 'a')
print >> output, new_line
output.close()
I am sure there is a considerably better way to go about things, but for this script,here are the issues I am having:
Any help appreciated for this eager newb!
The contents of the test_file.txt are:
I am sample file.
I contain an english spelling: colour.
3 american spellings on 1 line: color, analyze, utilize.
1 american spelling on 1 line: familiarize.
The extra blank line you are seeing is because you are using print
to write out a line that already includes a newline character at the end. Since print
writes its own newline too, your output becomes double spaced. An easy fix is to use outfile.write(new_line)
instead.
As for the file modes, the issue is that you're opening the output file over and over. You should just open it once, at the start. Its usually a good idea to use with
statements to handle opening files, since they'll take care of closing them for you when you're done with them.
I don't undestand your other issue, with only some of the replacements happening. Is your dictionary missing the spellings for 'analyze'
and 'utilize'
?
One suggestion I'd make is to not do your replacements line by line. You can read the whole file in at once with file.read()
and then work on it as a single unit. This will probably be faster, since it won't need to loop as often over the items in your spelling dictionary (just once, rather than once per line):
with open('test_file.txt', 'r') as in_file:
text = in_file.read()
with open('output_test_file.txt', 'w') as out_file:
out_file.write(replace_all(text, spelling_dict))
Edit:
To make your code correctly handle words that contain other words (like "entire" containing "tire"), you probably need to abandon the simple str.replace
approach in favor of regular expressions.
Here's a quickly thrown together solution that uses re.sub
, given a dictionary of spelling changes from American to British English (that is, in the reverse order of your current dictionary):
import re
#from english_american_dictionary import ame_to_bre_spellings
ame_to_bre_spellings = {'tire':'tyre', 'color':'colour', 'utilize':'utilise'}
def replacer_factory(spelling_dict):
def replacer(match):
word = match.group()
return spelling_dict.get(word, word)
return replacer
def ame_to_bre(text):
pattern = r'\b\w+\b' # this pattern matches whole words only
replacer = replacer_factory(ame_to_bre_spellings)
return re.sub(pattern, replacer, text)
def main():
#with open('test_file.txt') as in_file:
# text = in_file.read()
text = 'foo color, entire, utilize'
#with open('output_test_file.txt', 'w') as out_file:
# out_file.write(ame_to_bre(text))
print(ame_to_bre(text))
if __name__ == '__main__':
main()
One nice thing about this code structure is that you can easily convert from British English spellings back to American English ones, if you pass a dictionary in the other order to the replacer_factory
function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With