I have a text file with one sentence per line. I would like to lemmatize the worlds in each line using hunspell (-s option). Since I want to have the lemmas of each line separately, it wouldn't make sense to submit the whole text file to hunspell. I do need to send one line after another and have the hunspell output for each line.
Following the answers from How to process input and output streams in Steel Bank Common Lisp?, I was able to send the whole text file for hunspell one line after another but I was not able to capture the output of hunspell for each line. How interact with the process sending the line and reading the output before send another line?
My current code to read the whole text file is
(defun parse-spell-sb (file-in)
(with-open-file (in file-in)
(let ((p (sb-ext:run-program "/opt/local/bin/hunspell" (list "-i" "UTF-8" "-s" "-d" "pt_BR")
:input in :output :stream :wait nil)))
(when p
(unwind-protect
(with-open-stream (o (process-output p))
(loop
:for line := (read-line o nil nil)
:while line
:collect line))
(process-close p))))))
Once more, this code give me the output of hunspell for the whole text file. I would like to have the output of hunspell for each input line separately.
Any idea?
I suppose you have a buffering problem with the program you want to run. For example:
(defun program-stream (program &optional args)
(let ((process (sb-ext:run-program program args
:input :stream
:output :stream
:wait nil
:search t)))
(when process
(make-two-way-stream (sb-ext:process-output process)
(sb-ext:process-input process)))))
Now, on my system, this will work with cat
:
CL-USER> (defparameter *stream* (program-stream "cat"))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*) ; will hang without this
NIL
CL-USER> (read-line *stream*)
"foo bar baz"
NIL
CL-USER> (close *stream*)
T
Notice the finish-output
– without this, the read will hang. (There's also force-output
.)
Python in interactive mode will work, too:
CL-USER> (defparameter *stream* (program-stream "python" '("-i")))
*STREAM*
CL-USER> (loop while (read-char-no-hang *stream*)) ; skip startup message
NIL
CL-USER> (format *stream* "1+2~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"3"
NIL
CL-USER> (close *stream*)
T
But if you try this without the -i
option (or similar options like -u
), you'll probably be out of luck, because of the buffering going on. For example, on my system, reading from tr
will hang:
CL-USER> (defparameter *stream* (program-stream "tr" '("a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*) ; hangs
; Evaluation aborted on NIL.
CL-USER> (read-char-no-hang *stream*)
NIL
CL-USER> (close *stream*)
T
Since tr
doesn't provide a switch to turn off buffering, we'll wrap the call with a pty wrapper (in this case unbuffer
from expect):
CL-USER> (defparameter *stream* (program-stream "unbuffer"
'("-p" "tr" "a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"FOO BAR BAZ
"
NIL
CL-USER> (close *stream*)
T
So, long story short: Try using finish-output
on the stream before reading. If that doesn't work, check for command line options preventing buffering. If it still doesn't work, you could try wrapping the programm in some kind of pty-wrapper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With