I am trying to read the contents of a URL synchronously for a simple command-line batch script in Swift. I am using cURL for simplicity's sake - I know I could use NSURLSession if I had to. I am also building this with swift build
using the open-source version of Swift on OSX.
The problem is that on certain URLs, the NSTask never terminates, if stdout has been redirected to a pipe.
// This will hang, and when terminated with Ctrl-C reports "(23) Failed writing body"
import Foundation
let task = NSTask()
let pipe = NSPipe()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704647"]
task.standardOutput = pipe
task.launch()
task.waitUntilExit()
However, if you remove the pipe, or change the URL, the task succeeds.
// This will succeed - no pipe
import Foundation
let task = NSTask()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704647"]
task.launch()
task.waitUntilExit()
// This will succeed - different URL
import Foundation
let task = NSTask()
let pipe = NSPipe()
task.launchPath = "/usr/bin/curl"
task.arguments = ["http://trove.nla.gov.au/newspaper/page/21704646"]
task.standardOutput = pipe
task.launch()
task2.waitUntilExit()
Running any of the examples directly using curl from Terminal succeeds, so there is something about the interaction with NSTask, when retrieving from that specific URL (and a few others), and when a pipe is present, that is causing cURL to fail.
curl is the last command of this list. Solution #1: use rm -f instead of rm or don't terminate your lines with &&. Moreover, you have put set -e just before your shinny banner. This makes your script exit at the first command that fails unexpectedly.
Using the NSTask class, your program can run another program as a subprocess and can monitor that program’s execution. An NSTask object creates a separate executable entity; it’s different from NSThread because it doesn’t share memory space with the process that creates it.
A process operates within an environment defined by the current values for several items: the current directory, standard input, standard output, standard error, and the values of any environment variables. By default, an NSTask object inherits its environment from the process that launches it.
The way to execute curl command is to use sh (or bat if you are on the Windows server) step. You need to know that the sh step by default does not return any value, so if you try to assign it’s output to a variable, you will get the null value. To change this behavior, you need to set the returnStdout parameter to true.
Expanding a little on @Hod's answer: The standard output of the launched process is redirected to a pipe, but your program never reads from the other pipe end. A pipe has a limited buffer, see for example How big is the pipe buffer? which explains that the pipe buffer size on macOS is (at most) 64KB.
If the pipe buffer is full then the launched process cannot write on it
anymore. If the process uses blocking I/O then a write()
to the pipe will block until until at least one byte can be written. That does
never happen in your case, so the process hangs and does not terminate.
The problem can occur only if the amount written to standard output exceeds the pipe buffer size, which explains why it happens only with certain URLs and not with others.
As a solution, you can read from the pipe, e.g. with
let data = pipe.fileHandleForReading.readDataToEndOfFile()
before waiting for the process to terminate. Another option is to use asynchronous reading, e.g. with the code from Real time NSTask output to NSTextView with Swift:
pipe.fileHandleForReading.readabilityHandler = { fh in
let data = fh.availableData
// process data ...
}
That would also allow to read both standard output and standard error from a process via pipes without blocking.
Both curl and NSPipe buffer data. Based on the error you're getting when you ctrl-c out (which indicates curl couldn't write the expected amount of data), you've got a bad interaction between these.
Try adding the -N option to curl to prevent it from buffering its output.
curl can also output progress. I don't think that's causing a problem, but you might add -s to only get the data just in case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With