Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the French and Spanish special chars be held in a varchar?

French and Spanish have special chars in them that are not used in normal English (accented vowels and such).

Are those chars supported in a varchar? Or do I need a nvarchar for them?

(NOTE: I do NOT want a discussion on if I should use nvarchar or varchar.)

like image 902
Vaccano Avatar asked Aug 24 '11 21:08

Vaccano


People also ask

Can special characters be used in VARCHAR?

VARCHAR columns, as the name implies, store variable-length data. They can store characters, numbers, and special characters just like a CHAR column and can support strings up to 8000 bytes in size.

How many characters can VARCHAR hold?

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

Can VARCHAR store Unicode characters?

Unicode Data Types. Data types nchar, nvarchar, and long nvarchar are used to store Unicode data. They behave similarly to char, varchar, and long varchar character types respectively, except that each character in a Unicode type typically uses 16 bits.

Can VARCHAR store Chinese characters?

Answers. You'll need to ensure that your column is actually set to NVARCHAR(x) instead of VARCHAR(x). NVARCHAR columns can store unicode characters, which Chinese characters would be considered unlike VARCHAR columns which take up less space and cannot store unicode.


1 Answers

What SQL Implementation(s) are you talking about?

I can speak about Microsoft Sql Server; other SQL implementations, not so much.

For Microsoft SQL Server, the default collation is SQL_Latin1_General_CP1_CI_AS (Latin 1 General, case-preserving, case-insensitive, accent-sensitive). It allows the round-trip representation of most western European languages in single-byte form (varchar) rather than double-byte form (nvarchar).

It's built on the "Windows 1252" code page. That code page is effectively ISO-8859-1 with the code point range 0x80–0x9F being represented by an alternate set of glyphs, including the Euro symbol at 0x80. ISO-8859-1 specifies that code point range as control characters, which have no graphical representation.

ISO-8859-1 consists of the first 256 characters of Unicodes Basic Multilinigual Plane, covering the entire domain of an 8-bit character (0x00–0xFF). For details and comparison see

  • Unicode CO Controls and Basic Latin
  • Unicode C1 Controls and Latin-1 Supplement
  • Window 1252 Code Page
  • ISO-8859-1

Western European languages that will have a hard time with this collating sequence include (but aren't necessarily limited to) Latvian, Lithuanian, Polich, Czech and Slovak. If you need to support those, you'll either need to use a different collation (SQL Server offers a plethora of collations), or move to using nvarchar.

One should note that mixing collations within a database tends to cause problems. Deviating from the default collation should be done only when necessary and with an understanding of how you can shoot yourself in the foot with it.

I suspect Oracle and DB2 provide similar support. I don't know about MySQL or other implementations.

like image 196
Nicholas Carey Avatar answered Oct 25 '22 08:10

Nicholas Carey