I am trying to convert the file generated from a <code>mssql</code> to utf-8. When I open the output of he <code>mssql</code> using notepad++ in windows server 2003 recognises the file as <code>UCS-2LE</code> I copied the file to a Ubuntu machine, using <code>file [file]</code> it shows that the encoding is <code>UTF-16LE</code>. Really confused, there must be some difference in encoding, as the names are different. But why do I see this in the same file. Its a <code>.csv</code> file generated from the mssql query.

For the most part, UTF-16 and UCS-2 are the same thing. There is no difference. What it means is that each character is two bytes wide. "LE" stands for little endian, i.e. each two-byte character is stored with the low byte first. If you want to convert to UTF-8, in Notepad++ click <code>Convert to UTF-8</code> in the Encoding menu, then save. If your other programs choke on the file after doing this, or you see two garbage characters at the start of the file, then click <code>Convert to UTF-8 without BOM</code> instead.

notepad ++ shows ucs-2LE while ubuntu FILE [file] shows UTF-16LE, I am confused?

Tags:

notepad++

encoding

utf-8

ucs2

utf-16le

I am trying to convert the file generated from a mssql to utf-8. When I open the output of he mssql using notepad++ in windows server 2003 recognises the file as UCS-2LE I copied the file to a Ubuntu machine, using file [file] it shows that the encoding is UTF-16LE. Really confused, there must be some difference in encoding, as the names are different. But why do I see this in the same file. Its a .csv file generated from the mssql query.

852

asked Jul 31 '12 08:07

tough

1 Answers

For the most part, UTF-16 and UCS-2 are the same thing. There is no difference.

What it means is that each character is two bytes wide. "LE" stands for little endian, i.e. each two-byte character is stored with the low byte first.

If you want to convert to UTF-8, in Notepad++ click Convert to UTF-8 in the Encoding menu, then save.

If your other programs choke on the file after doing this, or you see two garbage characters at the start of the file, then click Convert to UTF-8 without BOM instead.

130

answered Dec 27 '22 11:12

BenW

Related questions
                            
                                Counting the byte size of a file encoded in ISO 8859-7 in JavaScript
                            
                                In Android, How to decode UTF-8 encoded String?
                            
                                How to use russian date string with strptime
                            
                                I want to read csv file using python27, but there is an error like" TypeError: 'encoding' is an invalid keyword argument for this function"
                            
                                Most compact URL encoding of JSON data?
                            
                                Special ä ö characters break UTF-8 encoding
                            
                                How to delta encode a C/C++ struct for transmission via sockets
                            
                                Moving PostgreSQL database fails on non-ascii characters with 'value too long'
                            
                                AJAX POST requests with JQuery don't urlencode '+'
                            
                                Can't make (UTF-8) traditional Chinese character to work in PHP gettext extension (.po and .mo files created in poEdit)
                            
                                Escaping non-ASCII characters (or how to remove the BOM?)
                            
                                Auto encoding detect in C# [duplicate]
                            
                                C# to Ruby sha1 base64 encode
                            
                                Is there a way to check the encoding of a C# string? [duplicate]
                            
                                Automatically HtmlEncode in ASP.NET
                            
                                Conversion of strings containing non printable characters
                            
                                python 2.7 encoding decoding
                            
                                Git: how to specify file names containing octal notation on the command line
                            
                                Text corrupt after changing the Eclipse to UTF-8 Encoding
                            
                                Node.js: crypto.pbkdf2 password to hex

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With