Following situation: <ul> <li>A PowerShell script creates a file with UTF-8 encoding</li> <li>The user may or may not edit the file, possibly losing the BOM, but should keep the encoding as UTF-8, and possibly changing the line separators</li> <li>The same PowerShell script reads the file, adds some more content and writes it all as UTF-8 back to the same file</li> <li>This can be iterated many times</li> </ul> With <code>Get-Content</code> and <code>Out-File -Encoding UTF8</code> I have problems reading it correctly. It's stumbling over the BOM it has written before (putting it in the content, breaking my parsing regex), does not use UTF-8 encoding and even deletes line breaks in the original content part. I need a function that can read any file with UTF-8 encoding, ignore and delete the BOM and not modify the content. What should I use? Update I have added a little test script that shows what I'm trying to do and what happens instead. <pre class="prettyprint"><code># Read data if exists $data = "" $startRev = 1; if (Test-Path test.txt) { $data = Get-Content -Path test.txt if ($data -match "^[0-9-]{10} - r([0-9]+)") { $startRev = [int]$matches[1] + 1 } } Write-Host Next revision is $startRev # Define example data to add $startRev = $startRev + 10 $newMsgs = "2014-04-01 - r" + $startRev + "`r`n`r`n" + ` "Line 1`r`n" + ` "Line 2`r`n`r`n" # Write new data back $data = $newMsgs + $data $data | Out-File test.txt -Encoding UTF8 </code></pre> After running it a few times, new sections should be added to the beginning of the file, the existing content should not be altered in any way (currently loses line breaks) and no additional new lines should be added at the end of the file (seems to happen sometimes). Instead, the second run gives me an error.

If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 : <pre class="prettyprint"><code>Get-Content -Path test.txt -Encoding UTF8 </code></pre>

Really JPBlanc is right. If you want it read as UTF8 then specify that when the file is read. On a side note, you're losing formatting in here with the [String]+[String] stuff. Not to mention your regex match doesn't work. Check out the regex search changes, and the changes made to the $newMsgs, and the way I'm outputting your data to the file. <pre class="prettyprint"><code># Read data if exists $data = "" $startRev = 1; if (Test-Path test.txt) { $data = Get-Content -Path test.txt #-Encoding UTF8 if($data -match "\br([0-9]+)\b"){ $startRev = [int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value + 1 } } Write-Host Next revision is $startRev # Define example data to add $startRev = $startRev + 10 $newMsgs = @" 2014-04-01 - r$startRev`r`n`r`n Line 1`r`n Line 2`r`n`r`n "@ # Write new data back $newmsgs,$data | Out-File test.txt -Encoding UTF8 </code></pre>

Read UTF-8 files correctly with PowerShell

Tags:

powershell

encoding

utf-8

Following situation:

A PowerShell script creates a file with UTF-8 encoding
The user may or may not edit the file, possibly losing the BOM, but should keep the encoding as UTF-8, and possibly changing the line separators
The same PowerShell script reads the file, adds some more content and writes it all as UTF-8 back to the same file
This can be iterated many times

With Get-Content and Out-File -Encoding UTF8 I have problems reading it correctly. It's stumbling over the BOM it has written before (putting it in the content, breaking my parsing regex), does not use UTF-8 encoding and even deletes line breaks in the original content part.

I need a function that can read any file with UTF-8 encoding, ignore and delete the BOM and not modify the content. What should I use?

Update

I have added a little test script that shows what I'm trying to do and what happens instead.

# Read data if exists
$data = ""
$startRev = 1;
if (Test-Path test.txt)
{
    $data = Get-Content -Path test.txt
    if ($data -match "^[0-9-]{10} - r([0-9]+)")
    {
        $startRev = [int]$matches[1] + 1
    }
}
Write-Host Next revision is $startRev

# Define example data to add
$startRev = $startRev + 10
$newMsgs = "2014-04-01 - r" + $startRev + "`r`n`r`n" + `
    "Line 1`r`n" + `
    "Line 2`r`n`r`n"

# Write new data back
$data = $newMsgs + $data
$data | Out-File test.txt -Encoding UTF8

After running it a few times, new sections should be added to the beginning of the file, the existing content should not be altered in any way (currently loses line breaks) and no additional new lines should be added at the end of the file (seems to happen sometimes).

Instead, the second run gives me an error.

957

asked Apr 01 '14 14:04

ygoe

2 Answers

If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 :

Get-Content -Path test.txt -Encoding UTF8

answered Dec 06 '22 11:12

JPBlanc

Really JPBlanc is right. If you want it read as UTF8 then specify that when the file is read.

On a side note, you're losing formatting in here with the [String]+[String] stuff. Not to mention your regex match doesn't work. Check out the regex search changes, and the changes made to the $newMsgs, and the way I'm outputting your data to the file.

# Read data if exists
$data = ""
$startRev = 1;
if (Test-Path test.txt)
{
    $data = Get-Content -Path test.txt #-Encoding UTF8
    if($data -match "\br([0-9]+)\b"){
        $startRev = [int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value + 1
    }
}
Write-Host Next revision is $startRev

# Define example data to add
$startRev = $startRev + 10
$newMsgs = @"
2014-04-01 - r$startRev`r`n`r`n
    Line 1`r`n
    Line 2`r`n`r`n
"@

# Write new data back
$newmsgs,$data | Out-File test.txt -Encoding UTF8

answered Dec 06 '22 13:12

TheMadTechnician

Related questions
                            
                                Powershell script to delete files not specified in a list
                            
                                Powershell, File system provider, Get-ChildItem filtering... where are the official docs?
                            
                                Powershell script from shortcut to change desktop
                            
                                PowerShell folder permission error - Some or all identity references could not be translated.
                            
                                How can I make this PowerShell script parse large files faster?
                            
                                NUL-byte between every other character in output
                            
                                Mongodb shell mongo: Only one usage of each socket address (protocol/network address/port) is normally permitted. for socket: 0.0.0.0:27017
                            
                                List process for current user
                            
                                How to check whether an application pool exists or not in IIS using powershell and web administration module?
                            
                                Is there any way to monitor the progress of a download using a WebClient object in powershell?
                            
                                How do you add more property values to a custom object
                            
                                Why does PowerShell use double colon( :: ) to call static methods of a .NET class? [closed]
                            
                                Powershell SQL SELECT output to variable
                            
                                Encode a string in UTF-8
                            
                                How to use New-Object of a class present in a C# DLL using PowerShell
                            
                                Get last element of pipeline in powershell
                            
                                Powershell string does not contain
                            
                                How do I find the MSI product version number using PowerShell?
                            
                                Using Powershell, how can i count the occurrence of each element in an array?
                            
                                Make PowerShell ignore semicolon

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With