I'm using Ruby to read and then print a file to stdout, redirecting the output to a file in Windows PowerShell.
However, when I inspect the files, I get this for the input:
PS D:> head -n 1 .\inputfile
<text id="http://observer.guardian.co.uk/osm/story/0,,1009777,00.html"> <s> Hooligans NNS hooligan
, , , unbridled JJ unbridled passion NN passion
- : - and CC and no DT no executive JJ executiv
e boxes NNS box . SENT . </s>
... yet this for the output:
PS D:> head -n 1 .\outputfile
ÿ_< t e x t i d = " h t t p : / / o b s e r v e r . g u a r d i a n . c o . u k / o s m / s t o r y / 0 , , 1 0 0 9 7 7 7 , 0
0 . h t m l " > < s > H o o l i g a n s N N S h o o l i g a n , ,
, u n b r i d l e d J J u n b r i d l e d p a s s i o n N N p a s s i o n
- : - a n d C C a n d n o D T n o e x e c u t i v e J J
e x e c u t i v e b o x e s N N S b o x . S E N T . < / s >
How can this happen?
Edit: since my problem didn't have anything to do with Ruby, I've removed the Ruby-code, and included my usage of the Windows shell.
In PowerShell >
is effectively the same as | Out-File
and Out-File defaults to Unicode encoding. Try this instead of using >
:
... | Out-File outputfile -encoding ASCII
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With