Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PowerShell - Batch change files encoding To UTF-8

I'm trying to do a dead simple thing: to change files encoding from anything to UTF-8 without BOM. I found several scripts that do this and the only one that really worked for me is this one: https://superuser.com/questions/397890/convert-text-files-recursively-to-utf-8-in-powershell#answer-397915.

It worked as expected, but I need the generated files without BOM. So I tried to modify the script a little, adding the solution given to this question: Using PowerShell to write a file in UTF-8 without the BOM

This is my final script:

foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    get-content $i | out-file -encoding $Utf8NoBomEncoding -filepath $dest
}

The problem is that powershell is returning me an error, regarding the System.Text.UTF8Encoding($False) line, complaining about an incorrect parameter:

It is not possible to validate the argument on the 'Encoding' parameter. The argument "System.Text.UTF8Encoding" dont belongs to the the group "unicode, utf7, utf8, utf32, ascii" specified by the ValidateSet attribute.

I wonder if I'm missing something, like powershell version or something like that. I never coded a Powershell script before, so I'm totally lost with this. And I need to change these files encoding, there are hundreds of them, I wouldn't like to do it myself one by one.

Actually I'm using the 2.0 version that comes with Windows 7.

Thanks in advance!

EDIT 1

I tried the following code, suggested by @LarsTruijens and other posts:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}

This gives me an Exception, complaining about one of the parameters for WriteAllLines: "Exception on calling 'WriteAllLines' with 3 arguments. The value can't be null". Parameter name: contents. The script creates all folders, though. But they are all empty.

EDIT 2

An interesting thing about this error is that the "content" parameter is not null. If I output the value of the $content variable (using Write-host) the lines are there. So why it becomes null when passed to WriteAllLines method?

EDIT 3

I've added a content check to the variable, so the script now looks like this:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
    }
    else {
        Write-Host "No content from: $i"
    }
}

Now every iteration returns "No content from: $i" message, but the file isn't empty. There is one more error: Get-content: can't find the path 'C:\root\FILENAME.php' because it doesn't exists. It seems that it is trying to find the files at the root directory and not in the subfolders. It appears to be able to get the filename from child folders, but tries to read it from root.

EDIT 4 - Final Working Version

After some struggling and following the advices I got here, specially from @LarsTruijens and @AnsgarWiechers, I finally made it. I had to change the way I was getting the directory from $PWD and set some fixed names for the folders. After that, it worked perfectly.

Here it goes, for anyone who might be interested:

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
$source = "path"
$destination = "some_folder"

foreach ($i in Get-ChildItem -Recurse -Force) {
    if ($i.PSIsContainer) {
        continue
    }

    $path = $i.DirectoryName -replace $source, $destination
    $name = $i.Fullname -replace $source, $destination

    if ( !(Test-Path $path) ) {
        New-Item -Path $path -ItemType directory
    }

    $content = get-content $i.Fullname

    if ( $content -ne $null ) {

        [System.IO.File]::WriteAllLines($name, $content, $Utf8NoBomEncoding)
    } else {
        Write-Host "No content from: $i"   
    }
}
like image 743
darksoulsong Avatar asked Sep 08 '13 14:09

darksoulsong


People also ask

What is utf8 with BOM?

The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important.


2 Answers

I adapted few snipplets when I needed to UTF8 encode a massive amount of log-files.

Note! Should not be used with -recurse

write-host " "
$sourcePath = (get-location).path   # Use current folder as source.
# $sourcePath = "C:\Source-files"   # Use custom folder as source.
$destinationPath = (get-location).path + '\Out'   # Use "current folder\Out" as target.
# $destinationPath = "C:\UTF8-Encoded"   # Set custom target path

$cnt = 0

write-host "UTF8 convertsation from " $sourcePath " to " $destinationPath

if (!(Test-Path $destinationPath))

{
  write-host "(Note: target folder created!) "
  new-item -type directory -path $destinationPath -Force | Out-Null
}

Get-ChildItem -Path $sourcePath -Filter *.txt | ForEach-Object {
  $content = Get-Content $_.FullName
  Set-content (Join-Path -Path $destinationPath -ChildPath $_) -Encoding UTF8 -Value $content
  $cnt++
 }
write-host " "
write-host "Totally " $cnt " files converted!"
write-host " "
pause
like image 143
Jocke Svensson Avatar answered Nov 04 '22 04:11

Jocke Svensson


You didn't follow the whole answer in here. You forgot the WriteAllLines part.

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False)
foreach ($i in Get-ChildItem -Recurse) {
    if ($i.PSIsContainer) {
        continue
    }

    $dest = $i.Fullname.Replace($PWD, "some_folder")

    if (!(Test-Path $(Split-Path $dest -Parent))) {
        New-Item $(Split-Path $dest -Parent) -type Directory
    }

    $content = get-content $i 
    [System.IO.File]::WriteAllLines($dest, $content, $Utf8NoBomEncoding)
}
like image 41
Lars Truijens Avatar answered Nov 04 '22 06:11

Lars Truijens