I have this code that should open and read two text files, and match when a word is present in both. The match is represented by printing "SUCESS" and by writing the word to a temp.txt file.
dir = open('listac.txt','r')
path = open('paths.txt','r')
paths = path.readlines()
paths_size = len(paths)
matches = open('temp.txt','w')
dirs = dir.readlines()
for pline in range(0,len(paths)):
for dline in range(0,len(dirs)):
p = paths[pline].rstrip('\n').split(".")[0].replace(" ", "")
dd = dirs[dline].rstrip('\n').replace(" ", "")
#print p.lower()
#print dd.lower()
if (p.lower() == dd.lower()):
print "SUCCESS\n"
matches.write(str(p).lower() + '\n')
listac.txt is formatted as
/teetetet
/eteasdsa
/asdasdfsa
/asdsafads
.
.
...etc
paths.txt is formated as
/asdadasd.php/asdadas/asdad/asd
/adadad.html/asdadals/asdsa/asd
.
.
...etc
hence I use the split function in order to get the first /asadasda (within paths.txt) before the dot. The problem is that the words never seem to match, I have even printed out each comparison before each IF statement and they are equal, is there something else that Python does before comparing strings?
=======
Thanks everyone for the help. As suggested by you, I cleaned the code so It ended up like this:
dir = open('listac.txt','r')
path = open('paths.txt','r')
#paths = path.readlines()
#paths_size = len(paths)
for line in path:
p = line.rstrip().split(".")[0].replace(" ", "")
for lines in dir:
d = str(lines.rstrip())
if p == d:
print p + " = " + d
Apparently, having p declared and initialized before entering the second for loop makes a difference in the comparison down the road. When I declared p and d within the second for loop, it wouldn't work. I don't know the reason for that but If someone does, I am listening :)
Thanks again!
While we're reading the entire datafiles into memory anyway, why not try to use sets and get the intersection?:
def format_data(x):
return x.rstrip().replace(' ','').split('.')[0].lower()
with open('listac.txt') as dirFile:
dirStuff = set( format_data(dline) for dline in dirFile )
with open('paths.txt') as pathFile:
intersection = dirStuff.intersection( format_data(pline) for pline in pathFile )
for elem in intersection:
print "SUCCESS\n"
matches.write(str(elem)+"\n")
I've used the same format_data function for both datasets, since they look more or less the same, but you can use more than one function if you please. Also note that this solution only reads 1 of the two files into memory. The intersection with the other should be calculated lazily.
As pointed out in the comments, this does not make any attempt to preserve the order. However, if you really need to preserve the order, try this:
<snip>
...
</snip>
with open('paths.txt') as pathFile:
for line in pathFile:
if format_line(line) in dirStuff:
print "SUCCESS\n"
#...
I'd have to see more of your data set to see why you aren't getting matches. I've refactored some of your code to be more pythonic.
dirFile = open('listac.txt','r')
pathFile = open('paths.txt','r')
paths = pathFile.readlines()
dirs = dirFile.readlines()
matches = open('temp.txt','w')
for pline in paths:
p = pline.rstrip('\n').split(".")[0].replace(" ", "")
for dline in dirs:
dd = dline.rstrip('\n').replace(" ", "")
#print p.lower()
#print dd.lower()
if p.lower() == dd.lower():
print "SUCCESS\n"
matches.write(str(p).lower() + '\n')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With