Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Piping Text To An External Program Appends A Trailing Newline

I have been comparing hash values between multiple systems and was surprised to find that PowerShells hash values are different than that of other terminals.

Linux terminals (CygWin, Bash for Windows, etc.) and Windows Command Prompt are all showing the same hash where as PowerShell is showing a different hash value.

Linux_Vs_PShell_Hash_Compare.png

This was tested using SHA256 but found the same issue when using other algorithms like md5.

Encoding Update:

Tried changing the PShell encoding but it did not have any effect on the returned hash values.

[Console]::OutputEncoding.BodyName 
iso-8859-1
[Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
utf-8

GitHub PowerShell Issue

https://github.com/PowerShell/PowerShell/issues/5974

like image 792
Pie Avatar asked Jan 21 '18 20:01

Pie


2 Answers

tl;dr:

When PowerShell pipes a string to an external program:

  • It encodes it using the character encoding stored in the $OutputEncoding preference variable
  • It invariably appends a trailing (platform-appropriate) newline.

Therefore, the key is to avoid PowerShell's pipeline in favor of the native shell's, so as to prevent implicit addition of a trailing newline:

  • If you're running your command on a Unix-like platform (using PowerShell Core):
sh -c "printf %s 'string' | openssl dgst -sha256 -hmac authcode"

printf %s is the portable alternative to echo -n. If the string contains ' chars., double them or use `"...`" quoting instead.

  • In case you need to do this on Windows via cmd.exe, things get even trickier, because cmd.exe doesn't directly support echoing without a trailing newline:
cmd /c "<NUL set /p =`"string`"| openssl dgst -sha256 -hmac authcode"

Note that there must be no space before | for this to work. For an explanation and the limitations of this solution, see this answer.

Encoding issues would only arise if the string contained non-ASCII characters and you're running in Windows PowerShell; in that event, first set $OutputEncoding to the encoding that the target utility expects, typically UTF-8: $OutputEncoding = [Text.Utf8Encoding]::new()


  • PowerShell, as of Windows PowerShell v5.1 / PowerShell (Core) v7.2, invariably appends a trailing newline when you send a string without one via the pipeline to an external utility, which is the reason for the difference you're observing (that trailing newline will be a LF only on Unix platforms, and a CRLF sequence on Windows).

    • You can keep track of efforts to address this problem in GitHub issue #5974, opened by the OP.
  • Additionally, PowerShell's pipeline is invariably text-based when it comes to piping data to external programs; the internally UTF-16LE-based PowerShell (.NET) strings are transcoded based on the encoding stored in the automatic $OutputEncoding variable, which defaults to ASCII-only encoding in Windows PowerShell, and to UTF-8 encoding in PowerShell Core (both on Windows and on Unix-like platforms).

    • In PowerShell Core, a change is being discussed for piping raw byte streams between external programs.
  • The fact that echo -n in PowerShell does not produce a string without a trailing newline is therefore incidental to your problem; for the sake of completeness, here's an explanation:

    • echo is an alias for PowerShell's Write-Output cmdlet, which - in the context of piping to external programs - writes text to the standard input of the program in the next pipeline segment (similar to Bash / cmd.exe's echo).
    • -n is interpreted as an (unambiguous) abbreviation for Write-Output's -NoEnumerate switch.
    • -NoEnumerate only applies when writing multiple objects, so it has no effect here.
    • Therefore, in short: in PowerShell, echo -n "string" is the same as Write-Output -NoEnumerate "string", which - because only a single string is output - is the same as Write-Output "string", which, in turn, is the same as just using "string", relying on PowerShell's implicit output behavior.
    • Write-Output has no option to suppress a trailing newline, and even if it did, using a pipeline to pipe to an external program would add it back in.
like image 142
mklement0 Avatar answered Oct 19 '22 17:10

mklement0


Linux terminals and PowerShell use different encodings. So real bytes produced by echo -n "string" are different. I tried it on my Linux Mint terminal and Windows 10 PowerShell. Here what I got:

Linux Mint:

73 74 72 69 6E 67

Windows 10:

FF FE 73 00 74 00 72 00 69 00 6E 00 67 00 0D 00 0A 00

It seems that Linux terminals use UTF-8 and Windows PowerShell uses UTF-16 with a BOM. Also in PowerShell you cannot use '-n' parameter for echo. So echo places newline characters \r\n (0D 00 0A 00) at the end of the "string".

Edit: As mklement0 said below, Windows PowerShell uses ASCII by default when piping.

like image 20
Igor Avatar answered Oct 19 '22 15:10

Igor