Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this code print a different result between Windows and Linux?

This code prints a different string between Windows and Linux.

test.py:

print(";".join([str(i) for i in range(10000)]))

Platform: x86_64 Linux 4.4 .0-17763 - Microsoft
Python version: 3.7.2
Terminals: bash, fish

Abbreviated output:

$ python --version
Python 3.7.2
$ python test.py
0;1;2;3;4;5;6....9997;9998;9999
$ python -u test.py
0;1;2;3;4;5;6....9997;9998;9999

Platform: Windows 10 1809
Python version: 3.6.8, 3.7.0, 3.7.2
Terminals: cmd, powershell

Abbreviated output:

./python --version
Python 3.6.8
./python test.py
0;1;2;3;4;5;6....9997;9998;9999
./python -u test.py
0;1;2;3;4;5;6....2663;2664;2665;26
./python --version
Python 3.7.0
./python test.py
0;1;2;3;4;5;6....9997;9998;9999
./python -u test.py
0;1;2;3;4;5;6....2663;2664;2665;26
./python --version
Python 3.7.2
./python test.py
0;1;2;3;4;5;6....9997;9998;9999
./python -u test.py
0;1;2;3;4;5;6....2663;2664;2665;26

So why, in Windows, does the -u arg cause the output be truncated (just from 0 to 2666)?
(When using python -u test.py > a.txt to redirect the output to a file, it works correctly.)

Maybe something about buffering?

like image 905
OhYee Avatar asked Jan 19 '19 10:01

OhYee


Video Answer


1 Answers

The size of a console write via WINAPI WriteFile and WriteConsoleW is documented to have a vaguely-defined limit, as follows:

nNumberOfCharsToWrite [in]
The number of characters to be written. If the total size of the specified number of characters exceeds the available heap, the function fails with ERROR_NOT_ENOUGH_MEMORY.

It's not documented to which "heap" this is referring. A process can have multiple heaps of various sizes (fixed or dynamic). The native heap implementation in the NT runtime library (e.g. RtlCreateHeap) can create a heap at a specified address, which allows convenient access to memory that's shared with other processes. Using a shared heap is often combined with Local Inter-Process Communication (LPC) ports -- or Asynchronous LPC in NT 6.0+. LPC ports are used to pass messages between applications and system services, such as the session manager (smss.exe), service control manager (services.exe), local security authority (lsass.exe), desktop session server (csrss.exe), and instances of the console host (conhost.exe). Messages queued directly to an LPC port are limited to 256 bytes. Larger messages are passed by queuing a message to the port that references shared memory.

It turns out that the old implementation of the console (prior to NT 6.3) uses LPC as an I/O channel, and the above-mentioned heap is only 64 KiB. This was a peculiar choice of design. I think someone was drinking too much of the user-mode subsystem, message-passing Kool-Aid. Proper NT I/O uses a device with I/O system services, including NtCreateFile, NtReadFile, NtWriteFile, and NtDeviceIoControlFile.

A console application doesn't know how much of this heap is available for a write. Python could start at 64 KiB and work its way down, but its raw file I/O mandates one system call per call. Instead it caps writes at 32 KiB, which should succeed. This limit allows writing wide-character strings with up to 16K UTF-16 code points. A complication is that the console I/O stack uses UTF-8 in 3.6+, which has to be decoded via MultiByteToWideChar. Currently it just repeatedly divides the UTF-8 buffer in half until the resulting length is less than 16K. Thus, in the question's example, writing 48,889 characters gets halved to 24,444 characters and halved again to 12,222 characters. (IMO, it would be better to try writing up to 16K code points; get the number actually written, and call WideCharToMultiByte on the substring to determine the number of UTF-8 bytes written. The current design actually has a bug if a UTF-8 2-4 byte sequence overlaps a cut point.)

In NT 6.3+ (Windows 8.1+), Console I/O doesn't have this size limit because it uses the ConDrv device and I/O system calls instead of LPC. However, it's not worth special casing the code just to support an unbuffered text I/O stack, as configured by the -u command-line option. We expect interactive console I/O to be buffered. Unbuffered text I/O is actually disallowed with a normal open call. For example:

>>> open('conout$', 'w', buffering=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: can't have unbuffered text I/O

Extended support for Windows 7 ends on 14 January 2020, so Python 3.8 will be the last version to support it. The console write limit should be removed in Python 3.9.

like image 199
Eryk Sun Avatar answered Sep 30 '22 21:09

Eryk Sun