I'm finding myself somewhat stumpped on a simple problem. I'm trying to remove fancy quoting from a bunch of text files. I've the following script, where I'm trying a number of different replacement methods, but w/o results.
Here's an example that downloads the data from github and attempts to convert.
$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"foo.txt")
$fancySingleQuotes = "[" + [string]::Join("",[char[]](0x2019, 0x2018)) + "]"
$c = Get-Content "foo.txt"
$c | % { `
$_ = $_.Replace("’","'")
$_ = $_.Replace("`“","`"")
$_.Replace("`”","`"")
} `
| Set-Content "foo2.txt"
What's the trick for this to work?
UPDATE: Fixed my answer (manojlds comments were correct, the $_ thing was a red herring). Here's a version that works, and I've updated it to incorporate your testing code:
$srcUrl="https://raw.github.com/gist/1129778/d4d899088ce7da19c12d822a711ab24e457c023f/gistfile1.txt"
$wc = New-Object net.WebClient
$wc.DownloadFile($srcUrl,"C:\Users\hartez\SO6968270\foo.txt")
$fancySingleQuotes = "[\u2019\u2018]"
$fancyDoubleQuotes = "[\u201C\u201D]"
$c = Get-Content "foo.txt" -Encoding UTF8
$c | % { `
$_ = [regex]::Replace($_, $fancySingleQuotes, "'")
[regex]::Replace($_, $fancyDoubleQuotes, '"')
} `
| Set-Content "foo2.txt"
The reason that manojlds version wasn't working for you is that the encoding on the file you're getting from github wasn't compatible with the Unicode characters in the regex. Reading it in as UTF-8 fixes the problem.
The following works on the input and output that you had given:
$c = Get-Content $file
$c | % { `
$_ = $_.Replace("’","'")
$_ = $_.Replace("`“","`"")
$_.Replace("`”","`"")
} `
| Set-Content $file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With