I am using urllib2
's build_opener()
to create an OpenerDirector
. I am using the OpenerDirector
to fetch a slow page and so it has a large timeout.
So far, so good.
However, in another thread, I have been told to abort the download - let's say the user has selected to exit the program in the GUI.
Is there a way to signal an urllib2 download should quit?
There is no clean answer. There are several ugly ones.
Initially, I was putting rejected ideas in the question. As it has become clear that there are no right answers, I decided to post the various sub-optimal alternatives as a list answer. Some of these are inspired by comments, thank you.
An ideal solution would be if OpenerDirector
offered a cancel operator.
It does not. Library writers take note: if you provide long slow operations, you need to provide a way to cancel them if people are to use them in real-world applications.
As a general solution for others, this may work. With a smaller timeout, it would be more responsive to the changes in circumstances. However, it will also cause downloads to fail if they weren't completely finished in the timeout time, so this is a trade-off. In my situation, it is untenable.
Again, as a general solution, this may work. If the download consists of very large files, you can read them in small chunks, and abort after a chunk is read.
Unfortunately, if (as in my case) the delay is in receiving the first byte, rather than the size of the file, this will not help.
While there are some aggressive techniques to kill threads, depending on the operating system, they are not recommended. In particular, they can cause deadlocks to occur. See Eli Bendersky's two articles (via @JBernardo).
If the abort operation has been triggered by the user, it may be simplest to just be unresponsive, and not act on the request until the open operation has completed.
Whether this unresponsiveness is acceptable to your users (hint: no!), is up to your project.
It also continues to place a demand on the server, even if the result is known to be unneeded.
If you create a separate thread to run the operation, and then communicate with that thread in an interruptable manner, you could discard the blocked thread, and start working on the next operation instead. Eventually, the thread will unblock and then it can gracefully shut-down.
The thread should be a daemon, so it doesn't block the total shut-down of the application.
This will give the user responsiveness, but it means that the server that will need to continue to support it, even though the result is not needed.
As described in @Luke's answer, it may be possible to provide (fragile?, unportable?) extensions to the standard Python libraries.
His solution changes the socket operations from blocking to polling. Another might allow shutdown through the socket.shutdown()
method (if that, indeed, will interrupt a blocked socket - not tested.)
A solution based on Twisted may be cleaner. See below.
The Twisted framework provides a replacement set of libraries for network operations that are event-driven. I understand this means that all of the different communications can be handled by a single-thread with no blocking.
It may be possible to navigate the OpenerDirector
, to find the baselevel socket that is blocking, and sabotage it directly (Will socket.shutdown()
be sufficient?) to make it return.
Yuck.
The thread that reads the socket can be moved into a separate process, and interprocess communication can be used to transmit the result. This IPC can be aborted early by the client, and then the whole process can be killed.
If you have control over the web-server being read, it could be sent a separate message asking it to close the socket. That should cause the blocked client to react.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With