Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass UTF-8 characters to clip.exe with PowerShell without conversion to another charset?

I'm a Windows and Powershell noobie. I'm coming from Linux Land. I used to have this little Bash function in my .bashrc that would copy a "shruggie" (¯\_(ツ)_/¯) to the clipboard for me so that I could paste it into conversations on Slack and such.

My Bash alias looked like this: alias shruggie='printf "¯\_(ツ)_/¯" | xclip -selection c && echo "¯\_(ツ)_/¯"'

I realize that this question is juvenile, but the answer does have value to me as I'm sure that I will need to pipe odd UTF-8 characters to output in a Powershell script at some point in the future.

I wrote this function in my PowerShell profile:

function shruggie() {
  '¯\_(ツ)_/¯' | clip
  Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}

However, this gives me: ??\_(???)_/?? (Unknown UTF-8 chars are converted to ?) when I call it on the command line.

I've looked at [System.Text.Encoding]::UTF8 and some other questions but I don't know how to cast my string as UTF-8 and pass that through clip.exe and receive UTF-8 out on the other side (on the clipboard).

like image 645
jonathanbell Avatar asked Dec 29 '17 00:12

jonathanbell


3 Answers

There are two distinct, independent aspects:

  • copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe
  • writing (echoing) ¯\_(ツ)_/¯ to the console

Prerequisite: PowerShell must properly recognize your source code's encoding in order for the solutions below to work: if your source code is UTF-8-encoded, be sure to save the enclosing files as UTF-8 with BOM for Windows PowerShell to recognize it.

  • Windows PowerShell, in the absence of BOM, interprets source as "ANSI"-encoded, referring to the legacy, single-byte, extended-ASCII code page in effect, such as Windows-1252 on US-English system, and would therefore interpret UTF-8-encoded source code incorrectly.

  • Note that, by contrast, PowerShell Core uses UTF-8 as the default, so the BOM is no longer necessary (but still recognized).


Copying ¯\_(ツ)_/¯ to the clipboard, using clip.exe:

  • In Windows PowerShell v5.1+, you can use the built-in Set-Clipboard cmdlet to copy text to the clipboard from within PowerShell; given that PowerShell uses the .NET System.String type that is capable of representing all Unicode characters, there are no encoding issues.

    • Note that PowerShell Core, even when run on Windows, does NOT have this cmdlet (as of PowerShell Core v6.0.0-rc.2)
    • See this answer of mine for clipboard functions that work in earlier PowerShell versions as well as in PowerShell Core.
  • In earlier versions of Windows PowerShell and in PowerShell Core, use of clip.exe is a viable alternative, but its use requires additional work:

function shruggie() {
  $OutputEncoding = (New-Object System.Text.UnicodeEncoding $False, $False).psobject.BaseObject
  '¯\_(ツ)_/¯' | clip
  Write-Verbose -Verbose "Shruggie copied to clipboard." # see section about console output
}
  • New-Object System.Text.UnicodeEncoding $False, $False creates a BOM-less UTF16-LE encoding, which clip.exe understands.

    • The magic .psobject.BaseObject incantation is, unfortunately, required to work around a bug; in PSv5+, you can bypass this bug by using the following instead:
      [System.Text.UnicodeEncoding]::new($False, $False)
  • Assigning that encoding to preference variable $OutputEncoding ensures that PowerShell uses that encoding to pipe data to external utility clip.exe.


Writing ¯\_(ツ)_/¯ to the console:

Note: PowerShell Core on Unix platforms generally uses consoles (terminals) with a default encoding of (BOM-less) UTF-8, so no additional work is needed there.

To merely echo (print) Unicode characters (beyond the 8-bit range), it is sufficient to switch to a font that can display Unicode characters (beyond the extended ASCII range), because, as PetSerAl points out, PowerShell uses the Unicode version of the WriteConsole Windows API function to print to the console.

To support (most) Unicode characters, you most switch to one of the "TT" (TrueType) fonts.

PetSerAl points out in a comment that console windows on Windows are currently limited to a single 16-bit code unit per output character (cell); given that only (most of) the characters in the BMP (Basic Multilingual Plane) are self-contained 16-bit code units, the (rare) characters beyond the BMP cannot be represented.

Sadly, even that may not be enough for some (BMP) Unicode characters, given that the Unicode standard is versioned and font representations / implementations may lag.

Indeed, as of Windows 10 release ID 1703, only a select few fonts can render (Unicode character KATAKANA LETTER TU, U+30C4, UTF-8: E3 83 84):

  • MS Gothic
  • NSimSum

Note that if you want to (also) change how other applications interpret such output, you must again set $OutputEncoding:

For instance, to make PowerShell expect UTF-8 input from external utilities as well as output UTF-8-encoded data to external utilities, use the following:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

The above implicitly changes the code page to 65001 (UTF-8), as reflected in chcp (chcp.com).

Note that, for backward compatibility, Windows console windows still default to the single-byte, extended-ASCII legacy OEM code page, such as 437 on US-English systems.

Unfortunately, as of v6.0.0-rc.2, this also applies to PowerShell Core, even though it has otherwise switched to BOM-less UTF-8 as the default encoding, as also reflected in $OutputEncoding.

like image 162
mklement0 Avatar answered Sep 30 '22 19:09

mklement0


If you cannot use PowerShell 5's Set-Clipboard function (which is IMO the go-to solution) you can convert/encode your output in a way that clip.exe understands it correctly.

There are two ways to achieve what want here:

  1. Feed clip.exe with a UTF-16 file: clip < UTF16-Shruggie.txt
    The important part here is to save the file encoded as: Unicode (which means UTF-16 format little-endian byte order with BOM)
  2. Encode the string appropriately (the following part works in a PoSh editor like ISE but unfortunately not in a regular console, see mklment0s answer how to achieve this):
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
function shruggie() {

  [System.Text.Encoding]::Default.GetString(
    [System.Text.Encoding]::UTF8.GetBytes('¯\_(ツ)_/¯')
) | clip.exe
  Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}
shruggie

This works for me. Here is an MSDN blog post that gives further explanations about $OutputEncoding/[Console]::OutputEncoding.

like image 27
wp78de Avatar answered Sep 30 '22 19:09

wp78de


The post Set-Clipbord option is the most direct answer, but as noted a PoSHv5 and higher thing. However, depending on what OS he the OP is on, not all cmdlets are available on all OS/PoSH versions. This is not to say that Set-Clipboard is not, but since the OP says they're new, it's just a heads up.

If you can't go there for whatever reason, you can create your own and or use add-on modules. See this post:

Convert Keith Hill's PowerShell Get-Clipboard and Set-Clipboard to a PSM1 script

The results from using the Set-Clipboard function from the above post and modifying the OP's post for its use:

(Get-CimInstance -ClassName Win32_OperatingSystem).Caption
Microsoft Windows Server 2012 R2 Standard

$PSVersionTable

Name                           Value                                                                                                                    
----                           -----                                                                                                                    
PSVersion                      4.0                                                                                                                      
WSManStackVersion              3.0                                                                                                                      
SerializationVersion           1.1.0.1                                                                                                                  
CLRVersion                     4.0.30319.42000                                                                                                          
BuildVersion                   6.3.9600.18773                                                                                                           
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0}                                                                                                     
PSRemotingProtocolVersion      2.2                                                                                                                      



function Set-ClipBoard 
{
    Param
    (
        [Parameter(ValueFromPipeline=$true)]
        [string] $text
    )
    Add-Type -AssemblyName System.Windows.Forms
    $tb = New-Object System.Windows.Forms.TextBox
    $tb.Multiline = $true
    $tb.Text = $text
    $tb.SelectAll()
    $tb.Copy()
}

function New-Shruggie
{
    Set-ClipBoard -text '¯\_(ツ)_/¯'
    Write-Host '¯\_(ツ)_/¯ copied to clipboard.' -foregroundcolor yellow
}

New-Shruggie

¯\_(ツ)_/¯ copied to clipboard.

Results pasted from clipboard

¯\_(ツ)_/¯

There are options however, such as the following, but the above are still the best route.

First remember that output is controlled by the OS codepage and the interpreter (PoSH) and both default to ASCII.

You can see the PoSH default CP settings by looking at the output of the built-in variable

$OutputEncoding

As per the PoSH creator Jeffery Snover says: The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.
Some do, most don’t.

So, all that being said ... You can change the CodePage, by doing items like...

[Console]::OutputEncoding

Or ...

$OutputEncoding = New-Object -typename System.Text.UTF8Encoding

If sending out put to a file...

$OutPutData | Out-File $outFile -Encoding UTF8
like image 44
postanote Avatar answered Sep 30 '22 19:09

postanote