Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing “smart quotes” in powershell

I'm finding myself somewhat stumpped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but w/o results.

Here's an example that downloads the data from github and attempts to convert.

$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"

$c = Get-Content "foo.txt"
$c | % { `
        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")       
    } `
    |  Set-Content "foo2.txt"

What's the trick for this to work?

like image 934
Scott Weinstein Avatar asked Aug 06 '11 16:08

Scott Weinstein


2 Answers

UPDATE: Fixed my answer (manojlds comments were correct, the $_ thing was a red herring). Here's a version that works, and I've updated it to incorporate your testing code:

    $srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
    $wc = New-Object net.WebClient
    $wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")

    $fancySingleQuotes = "[\u2019\u2018]" 
    $fancyDoubleQuotes = "[\u201C\u201D]" 

    $c = Get-Content "foo.txt" -Encoding UTF8

    $c | % { `
        $_ = [regex]::Replace($_, $fancySingleQuotes, "'")   
        [regex]::Replace($_, $fancyDoubleQuotes, '"')     
    } `
    |  Set-Content "foo2.txt"

The reason that manojlds version wasn't working for you is that the encoding on the file you're getting from github wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.

like image 131
E.Z. Hart Avatar answered Oct 27 '22 23:10

E.Z. Hart


The following works on the input and output that you had given:

    $c = Get-Content $file 
    $c | % { `

        $_ = $_.Replace("’","'")
        $_ = $_.Replace("`“","`"")
        $_.Replace("`”","`"")
        } `
        |  Set-Content $file
like image 23
manojlds Avatar answered Oct 27 '22 21:10

manojlds