Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I change my Powershell script so that it writes out-file in ANSI - Windows-1252 encoding?

Tags:

powershell

I have a banking application script that generates a “filtered” output file by removing error records from a daily input bank file (see How do I create a Windows Server script to remove error records, AND the previous record to each, from a file with the results written to a NEW file). The “filtered” output file will be sent to the State for updating their system. As a side note, the original input files that we receive from the bank show as Unix 1252 (ANSI Latin 1) in my file editor (UltraEdit), and each record ends only with a line feed.

I sent a couple of test output files generated from both “clean” (no errors) and “dirty” (contained 4 errors) input files to the State for testing on their end to make sure all was good before implementation, but was a little concerned because the output files were generated in UTF-16 encoding with CRLF line endings, where the input and current unfiltered output are encoded in Windows-1252. All other output files on this system are Windows-1252 encoded.

Sure enough… I got word back that the encoding is incorrect for the state’s system. Their comments were: “The file was encoded UCS-2 Little Endian and needed to be converted to ANSI to run on our system. That was unexpected.

After that the file with no detail transactions would run through our EFT rejects program ok.

It seems that it was processed ok, but we had to do some conversion. Can it be sent in ANSI or needs to be done in UCS 2 Little Endian?”

I have tried unsuccessfully adding –Encoding “Windows-1252” and –Encoding windows-1252 to my out-file statement, with both returning the message: Out-File : Cannot validate argument on parameter 'Encoding'. The argument "Windows-1252" does not belong to the set "unknown,string,unicode,bigendianunicode,utf8,utf7,utf32,ascii,default,oem" specified by the ValidateSet attribute. Supply an argument that is in the set and then try the command again. At C:\EZTRIEVE\PwrShell\TEST2_FilterR02.ps1:47 char:57 + ... OutputStrings | Out-File $OutputFileFiltered -Encoding "Windows-1252" + ~~~~~~~~~~~~~~ + CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingVal idationException + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.Power Shell.Commands.OutFileCommand

I’ve looked high and low for some help with this for days, but nothing is really clear, and the vast majority of what I found, involved converting FROM Windows-1252 TO another encoding. Yesterday, I found a comment somewhere on stackoverflow that “ANSI” is the same as Windows-1252, but so far, I have not found anything that shows me how to properly append the Windows-1252 encoding option to my out-file statement so Powershell will accepted it. I really need to get this project finished so I can tackle the next several that have been added to my queue. Is there possibly a subparameter that I’m missing that needs to be appended to –Encoding?

This is being tested under Dollar Universe (job scheduler) on a new backup server running Windows Server 2016 Standard with Powershell 5.1. Our production system runs Dollar Universe on Windows Server 2012 R2, also with Powershell 5.1 (yes, we are looking for a sufficient upgrade window :-)

As of my last attempt, my Powershell script is :

 [cmdletbinding()]
 Param
 (
     [string] $InputFilePath
 )   

 # Read the text file
 $InputFile = Get-Content $InputFilePath

# Initialize output record counter
$Inrecs = 0
$Outrecs = 0

# Get the time
$Time = Get-Date -Format "MM_dd_yy"

# Set up the output file name
$OutputFileFiltered = "C:\EZTRIEVE\CFIS\DATA\TEST_CFI_EFT_RETURN_FILTERED"

# Initialize the variable used to hold the output
$OutputStrings = @()

# Loop through each line in the file
# Check the line ahead for "R02" and add it to the output
# or skip it appropriately
for ($i = 0; $i -lt $InputFile.Length - 1; $i++)
{
    if ($InputFile[$i + 1] -notmatch "R02")
    {
        # The next record does not contain "R02", increment count and add it to the output
        $Outrecs++
        $OutputStrings += $InputFile[$i]
    }
    else
    {
        # The next record does contain "R02", skip it
        $i++
    }
}

# Add the trailer record to the output
$OutputString += $InputFile[$InputFile.Length - 1]

# Write the output to a file
# $OutputStrings | Out-File $OutputFileFiltered
$OutputStrings | Out-File $OutputFileFiltered -Encoding windows-1252

# Display record processing stats:

$Filtered = $Outrecs-$i

Write-Host $i  Input records processed

Write-Host $Filtered  Error records filtered out

Write-Host $Outrecs  Output records written
like image 317
K9-Guy Avatar asked Jan 01 '23 03:01

K9-Guy


1 Answers

Note:

  • You later clarified that you need LF (Unix-style) newlines - see the bottom section.

  • The next section deals with the question as originally asked and presents solutions that result in files with CRLF (Windows-style) newlines (when run on Windows).


If your system's Language for non-Unicode programs setting (a.k.a. the system locale) happens to have Windows-1252 as the active ANSI code page (e.g, on US-English or Western European systems), use -Encoding Default, because Default refers to that code page in Windows PowerShell (but not in PowerShell Core, which defaults to BOM-less UTF-8 and doesn't support the Default encoding identifier).

Verify with: (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP) -eq '1252'

... | Out-File -Encoding Default $file

Note:

  • If you are certain that your data is actually composed exclusively of ASCII-range characters (characters with code points in the 7-bit range, which excludes accented characters such as ü), -Encoding Default will work even if your system locale uses an ANSI code page other than Windows-1252, given that all (single-byte) ANSI code pages share all ASCII characters in their 7-bit subrange; you could then also use -Encoding ASCII, but note that if there are non-ASCII characters present after all, they will be transliterated to literal ? chars., resulting in loss of information.

  • The Set-Content cmdlet actually defaults to the Default encoding in Windows PowerShell (but not PowerShell Core, where the consistent default is UTF-8 without BOM).

  • While Set-Content's stringification behavior differs from that of Out-File - see this answer - it's actually the better choice if the objects to write to the file already are strings.


Otherwise, you have two options:

  • Use the .NET Framework file I/O functionality directly, where you can use any encoding supported by .NET; e.g.:

      $lines = ...  # array of strings (to become lines in a file)
      # CAVEAT: Be sure to specify an *absolute file path* in $file,
      #         because .NET typically has a different working dir.
      [IO.File]::WriteAllLines($file, $lines, [Text.Encoding]::GetEncoding(1252))
    
  • Use PowerShell Core, which allows you to pass any supported .NET encoding to the
    -Encoding parameter:

      ... | Out-File -Encoding ([Text.Encoding]::GetEncoding(1252)) $file
    

Note that in PSv5.1+ you can actually change the encoding used by the > and >> operators, as detailed in this answer.
However, in Windows PowerShell you are again limited to the encodings supported by Out-File's -Encoding parameter.


Creating text files with LF (Unix-style) newlines on Windows:

PowerShell (invariably) and .NET (by default) use the platform-appropriate newline sequence - as reflected in [Environment]::NewLine - when writing strings as lines to a file. In other words: on Windows you'll end up with files with CRLF newlines, and on Unix-like platforms (PowerShell Core) with LF newlines.

Note that the solutions below assume that the data to write to your file is an array of strings that represent the lines to write, as returned by Get-Content, for instance (where the resulting array elements are the input file's lines without their trailing newline sequence).

To explicitly create a file with LF newlines on Windows (PSv5+):

$lines = ...  # array of strings (to become lines in a file)

($lines -join "`n") + "`n" | Set-Content -NoNewline $file

"`n" produces a LF character.

Note:

  • In Windows PowerShell this implicitly uses the active ANSI code page's encoding.

  • In PowerShell Core this implicitly creates a UTF-8 file without BOM. If you want to use the active ANSI code page instead, use:

    -Encoding ([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
    

In PSv4- (PowerShell version 4 or lower), you'll have to use the .NET Framework directly:

$lines = ...  # array of strings (to become lines in a file)


# CAVEAT: Be sure to specify an *absolute file path* in $file,
#         because .NET typically has a different working dir.
[IO.File]::WriteAllText($file, ($lines -join "`n") + "`n")

Note:

  • In both Windows PowerShell and PowerShell Core this creates a UTF-8 file without BOM.

  • If you want to use the active ANSI code page instead, pass the following as an additional argument to WriteAllText():

    ([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
    
like image 75
mklement0 Avatar answered Jan 03 '23 15:01

mklement0