Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing string with id using dictionary in python

I have a dictionary file that contains a word in each line.

titles-sorted.txt

 a&a    
 a&b    
 a&c_bus    
 a&e    
 a&f    
 a&m    
 ....

For each word, its line number is the word's id.

Then I have another file that contains a set of words separated by tab in each line.

a.txt

 a_15   a_15_highway_(sri_lanka)    a_15_motorway   a_15_motorway_(germany) a_15_road_(sri_lanka)

I'd like to replace all of the words by id if it exists in the dictionary, so that the output looks like,

    3454    2345    123   5436     322 .... 

So I wrote such python code to do this:

 f = open("titles-sorted.txt")
 lines = f.readlines()
 titlemap = {}
 nr = 1
 for l in lines:
     l = l.replace("\n", "")
     titlemap[l.lower()] = nr
     nr+=1

 fw = open("a.index", "w")
 f = open("a.txt")
 lines = f.readlines()
 for l in lines:
     tokens = l.split("\t")
     if tokens[0] in titlemap.keys():
            fw.write(str(titlemap[tokens[0]]) + "\t")
            for t in tokens[1:]:
                    if t in titlemap.keys():
                            fw.write(str(titlemap[t]) + "\t")
            fw.write("\n")

 fw.close()
 f.close()

But this code is ridiculously slow, so it makes me suspicious if I have done everything right.

Is this an efficient way to do this?

like image 837
pandagrammer Avatar asked Apr 30 '26 11:04

pandagrammer


1 Answers

The write loop contains a lot of calls to write, which are usually inefficient. You can probably speed things up by writing only once per line (or once per file if the file is small enough)

tokens = l.split("\t")
fw.write('\t'.join(fw.write(str(titlemap[t])) for t in tokens if t in titlemap)
fw.write("\n")

or even:

lines = []
for l in f:
    lines.append('\t'.join(fw.write(str(titlemap[t])) for t in l.split('\t') if t in titlemap)
fw.write('\n'.join(lines))

Also, if your tokens are used more than once, you can save time by converting them to string when you read then:

titlemap = {l.strip().lower(): str(index) for index, l in enumerate(f, start=1)}
like image 192
njzk2 Avatar answered May 01 '26 23:05

njzk2



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!