Following situation:
With Get-Content
and Out-File -Encoding UTF8
I have problems reading it correctly. It's stumbling over the BOM it has written before (putting it in the content, breaking my parsing regex), does not use UTF-8 encoding and even deletes line breaks in the original content part.
I need a function that can read any file with UTF-8 encoding, ignore and delete the BOM and not modify the content. What should I use?
Update
I have added a little test script that shows what I'm trying to do and what happens instead.
# Read data if exists
$data = ""
$startRev = 1;
if (Test-Path test.txt)
{
$data = Get-Content -Path test.txt
if ($data -match "^[0-9-]{10} - r([0-9]+)")
{
$startRev = [int]$matches[1] + 1
}
}
Write-Host Next revision is $startRev
# Define example data to add
$startRev = $startRev + 10
$newMsgs = "2014-04-01 - r" + $startRev + "`r`n`r`n" + `
"Line 1`r`n" + `
"Line 2`r`n`r`n"
# Write new data back
$data = $newMsgs + $data
$data | Out-File test.txt -Encoding UTF8
After running it a few times, new sections should be added to the beginning of the file, the existing content should not be altered in any way (currently loses line breaks) and no additional new lines should be added at the end of the file (seems to happen sometimes).
Instead, the second run gives me an error.
Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.
The byte-order-mark For more information, see the Byte order mark documentation. In Windows PowerShell, any Unicode encoding, except UTF7 , always creates a BOM.
The Get-Content cmdlet gets the content of the item at the location specified by the path, such as the text in a file or the content of a function. For files, the content is read one line at a time and returns a collection of objects, each of which represents a line of content.
If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 :
Get-Content -Path test.txt -Encoding UTF8
Really JPBlanc is right. If you want it read as UTF8 then specify that when the file is read.
On a side note, you're losing formatting in here with the [String]+[String] stuff. Not to mention your regex match doesn't work. Check out the regex search changes, and the changes made to the $newMsgs, and the way I'm outputting your data to the file.
# Read data if exists
$data = ""
$startRev = 1;
if (Test-Path test.txt)
{
$data = Get-Content -Path test.txt #-Encoding UTF8
if($data -match "\br([0-9]+)\b"){
$startRev = [int]([regex]::Match($data,"\br([0-9]+)\b")).groups[1].value + 1
}
}
Write-Host Next revision is $startRev
# Define example data to add
$startRev = $startRev + 10
$newMsgs = @"
2014-04-01 - r$startRev`r`n`r`n
Line 1`r`n
Line 2`r`n`r`n
"@
# Write new data back
$newmsgs,$data | Out-File test.txt -Encoding UTF8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With