Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to provide long (1024+ character) inputs to the OCaml toplevel and coqtop (and Proof General)

Edit 4: It turns out that this is actually just a limitation of TTY input in general; there's nothing specific about OCaml, Coq, or Emacs which is causing the problem.


I'm working on a Coq program using Proof General in Emacs, and I've found a bug with input that's too long. If a region to submit to coqtop through Proof General contains more than 1023 characters, Proof General (though not Emacs) hangs while waiting for a response, and the *coq* buffer contains one extra ^G character for every character over 1023. For instance, if a 1025-character region was sent to coqtop, then the *coq* buffer would end with the two extra characters ^G^G. I can't proceed past this point in the file, and I have to kill the coqtop process (either with C-c C-x or a kill/killall from the terminal).

Something about this limitation arises from coqtop itself. If one generates a 1024-character or longer string and pipes it in, such as by running

perl -e 'print ("Eval simpl in " . (" " x 1024) . "1.\n")' | coqtop

then everything works fine. (Similarly, coqc works fine as well.) However, if I run coqtop in a terminal, I can't type more than 1024 characters on one line, including the ending return character. Thus, typing a 1023-character line and then hitting return works; but after typing 1024 characters, hitting any key, including return (but not including delete, etc.), does nothing but produce a beep. And it turns out that ocaml (the OCaml toplevel) has the same behavior:

perl -e 'print ((" " x 1024) . "1;;")' | ocaml

works fine, but I can't type more than 1024 characters on one line if running ocaml from a terminal. Since my understanding is that coqtop relies on the OCaml toplevel (more obviously when run as coqtop -byte), I imagine this is a related limitation.

The relevant software versions are:

  • OCaml 3.12.1 from Homebrew;
  • Coq 8.3pl3 (and 8.3pl2) from Homebrew;
  • Proof General 4.1;
  • The build of GNU Emacs 24.1.1 from Emacs for Mac OS X; and
  • Mac OS X 10.6.7.

And my questions are:

  • What about ocaml and coqtop is enforcing this character limit? And why only for input from the terminal or Emacs, as opposed to input from a pipe or a file?
  • Why does Proof General's (apparent) ignorance of this limit cause hanging errors and mysterious ^Gs?
  • How can I work around this limitation? My end goal is to use Coq inside Proof General/Emacs, so workarounds which dodge the underlying issue are fine.

Edit 3: After finding that the 1024-character input limitation also exists in the Ocaml toplevel (something which I imagine is related), I've added that information and deleted the original problem description, as it's been completely obscured and superseded. (See the edit history if necessary).

like image 668
Antal Spector-Zabusky Avatar asked Jul 04 '12 02:07

Antal Spector-Zabusky


2 Answers

I reported this as issue 5678 on the OCaml bug tracker, and the user dim explained that this wasn't a problem with OCaml per se, but was a limitation of TTY input. The issue is this. Since text isn't sent to running commands until the user hits return, all that waiting input must be stored somewhere. The buffer it's stored in, called the input queue or the type-ahead buffer, has a fixed size, which is controlled by the C constant MAX_INPUT. This constant is equal to 1024 on Mac OS X. Buffering like this allows useful processing of input, such as deleting characters before they're sent. All commands run from the terminal which don't do something special (such as using the readline library) will exhibit this behavior; for example, cat chokes in exactly the same way.

To avoid this behavior, one can unset the ICANON flag, for instance by running stty -icanon; this puts the TTY into non-canonical input mode, where input is not processed at all before being sent to the command. This means that editing becomes impossible: delete, left and right arrows, and so on all enter their literal equivalents (^?, ^[[D, ^[[C, …); similarly, ⌃D no longer sends an EOF, but merely a literal control character. However, for my particular use case, this (so far!) seems to be ideal, since Emacs is handling all my input for me. (Edit: But there's a better option!) (Libraries like readline, as I understand it, change this setting as well, but watch for control characters and handle editing, etc., themselves.) To restore canonical mode, one can run stty icanon.

The ledit tool wraps line editing around the program given to it as an argument, so ledit coqtop works fine (if oddly; I prefer ledit -l 65536 to avoid its scrolling), but interacted oddly with Emacs. The rlwrap tool does the same thing, but leaves the other program reading from a TTY; thus, while it can receive longer inputs, hitting enter and sending them to the wrapped command behaves very strangely and ends up necessitating that the command be killed.

Edit: In my specific use-case, I can also simply tell Emacs to use a pipe instead of a PTY, solving the problems at a stroke. The Emacs variable process-connection-type controls how subsidiary processes are communicated with; nil means to use a pipe, and non-nil means to use a TTY. Proof General uses the variable proof-shell-process-connection-type to determine how this should be set. Using a pipe solves all the 1024-character-limit problems.

like image 86
Antal Spector-Zabusky Avatar answered Sep 29 '22 11:09

Antal Spector-Zabusky


I'm not sure how the Emacs/coqtop interaction plays here, but I believe there is indeed an OCaml toplevel bug, and it should be reported in the OCaml bugtracker. Would you be ready to report it? If not, I can take care of it.

What about ocaml and coqtop is enforcing this character limit?

There are various input buffer at play in the toplevel code, some of them of length 1024; after a quick look at the code, there is a resizing logic in case the input gets too large, so it should work. I have been able to reproduce the "can't type more than N characters in the interactive toplevel" issue (when not using rlwrap), but with a N=4096 limit rather than N=1024, so I'm unsure it's the exact same problem.

And why only for input from the terminal or Emacs, as opposed to input from a pipe or a file?

The toplevel code makes a distinction between interactive and non-interactive input; iirc. it affects the way error locations are printed, for example.

Why does Proof General's (apparent) ignorance of this limit cause hanging errors and mysterious ^Gs?

I have no idea. The coqtop problem you're observing may even be a different bug (than the ocaml one) caused by a similar buffering logic.

How can I work around this limitation?

How about not sending too long inputs at once in Proof General? Maybe you can factorize your code to use intermediary definitions or something, to stay below the limit.

Regarding the "upstream fix" situation: both OCaml and Coq are, I believe, in the process of getting a new version soon. If people are interested enough in the bug to get a fix real soon (in particular, if you find the fix yourself), it could be integrated upstream reasonably quickly. Otherwise you'll have to wait for the next release cycle, and possibly maintain a local fork in the meantime to avoid the issue. Pragmatically, the "workaround by changing my Coq development" choice is probably the lowest-effort solution, but it won't benefit humanity as whole!

Edit: (to answer comments)

The resizing logic I was thinking of is in Lexing.lex_refill, found in stdlib/lexing.ml, called by the closure created by Lexing.from_function, called from toplevel/toploop.ml.

I have another "workaround" idea: write your long phrase to an external file foo.v, and use Load foo. to get the toplevel to read the file itself. I suspect this will work around the size limitation, but have not tested.

like image 28
gasche Avatar answered Sep 29 '22 11:09

gasche