I have this piece of python code, that loops thru a list of urls in a text file(urls.txt) then follows redirects of all urls and if the url contains a specific string, it writes it to a file called redirects.txt
import urllib.request
import ssl
redf = open('redirect.txt', 'w')
with open('urls.txt') as f:
for row in f:
#try:
context = ssl._create_unverified_context()
finalurl = ''
try:
res = urllib.request.urlopen(row, context=context, timeout=10)
finalurl = res.geturl().strip()
except:
#remove from list
print("error:"+finalurl)
# filedata = file.read()
if finalurl.strip():
if "/admin/" in finalurl:
redf.write(finalurl+"\n");
The problem is that I have to wait for the entire URS to be processed before the redirect.txt file is created.
How can I write in real time?
The file is created, but since your output is small, it's likely that it's all stuck in the write buffer until the file is closed. If you need the file to be filled in more promptly, either open it in line buffered mode by passing buffering=1:
open('redirect.txt', 'w', buffering=1)
or flush after each write, either by explicitly calling flush:
redf.write(finalurl+"\n")
redf.flush()
or, since you're adding newlines anyway so you may as well let it work for you, by using print with flush=True:
print(finalurl, file=redf, flush=True)
Side-note: You really want to use with statements with files opened for write in particular, but you only used it for the file being read (where it's less critical, since the worst case is just a delayed handle close, not lost writes). Otherwise exceptions can lead to arbitrary delaying in the file being flushed/closed. Just combine the two opens into one with, e.g.:
with open('urls.txt') as f, open('redirect.txt', 'w', buffering=1) as redf:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With