Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to select rows that contains non-english characters in sql server 2005(it should filter only non-english chars, not special characters)

Tags:

sql-server

As my table contains non-English(contains characters in different languages) characters and special characters in a column. I need filter only non-English characters. It should filter any special characters.

i tried using different methods to filter but failed to filter few rows. someone please help me on this. Thanks in advance.

ex: column name LOCATION contains following rows :

row 1: துய இம்மானுவேல் தேவாலயம், North Street, Idyanvillai, Tamil Nadu, India

row 2:Dr.Hakim M.Asgar Ali's ROY MEDICAL CENTRE™ Unani Clinic In Kerala India, Thycaud Hospital Road, Opp. Amritha Hotel,, Thycaud.P.O.,, Thiruvananthapuram, Kerala, India

row 3: ಕಾಳಿಕಾಂಬ ದೇವಿ ದೇವಸ್ಥಾನ, Shivaji Nagar, Davangere, Karnataka, India

As the above contains characters in many language. can any one help me to select only row 2 thanks.

like image 209
shivadarshan Avatar asked Jan 15 '14 13:01

shivadarshan


People also ask

How can I store non English characters in SQL Server?

When you want to store a foreign language in your SQL Server, you have to use the data type NVARCHAR for your column. If you do not use the datatype NVARCHAR, you will be not able to store non-English strings.

How do I filter specific rows in SQL?

In SQL, the SELECT statement is used to return specific columns of data from a table. Similarly, the WHERE clause is used to choose the rows of a table, and the query will return only the rows that meet the given criteria.

How do I exclude characters in a string in SQL?

Use the TRIM() function with the LEADING keyword to remove characters at the beginning of a string.


1 Answers

T-SQL's string-handling capability is pretty rudimentary.

If the "non-English" fields are distinguished by their use of Unicode UTF-16, you can try something like

SELECT * FROM MyTable WHERE MyField = Cast(MyField AS VARCHAR)

to pull only rows that are expressible in UTF-8.

The only way I know how to test whether a field is drawn from an arbitrary set of characters is with a user-defined function, like this:

CREATE FUNCTION IsAllowed (@input VARCHAR(MAX)) RETURNS BIT
-- Returns 1 if string is allowed, 0 otherwise.
-- Usages: SELECT dbo.IsAllowed('Hello'); -- returns 1
--         SELECT dbo.IsAllowed('Hello, world!'); -- returns 0
-- Note CHARINDEX is not case sensitive so @allowables doesn't need both.
--      VARCHAR(MAX) is different under SQL Server 2005 than 2008+
---     and use of defined VARCHAR size might be necessary.
AS
BEGIN
  DECLARE @allowables char(26) = 'abcdefghijklmnopqrstuvwxyz';
  DECLARE @allowed int = 0; 
  DECLARE @index int = 1;
  WHILE @index <= LEN(@input)
    BEGIN
    IF CHARINDEX(SUBSTRING(@input,@index,1),@allowables)=0
      BEGIN
      SET @allowed = 0;
      BREAK;
      END
    ELSE
      BEGIN
      SET @allowed = 1;
      SET @index = @index+1;
      END
    END
  RETURN @allowed
END

User-defined functions can be applied to columns in SELECT, like this:

SELECT * FROM MyTable WHERE dbo.IsAllowed(MyField) = 1

Note the schema name (dbo in this case) is not optional with user-defined functions.

If a T-SQL user-defined function is inadequate, you can also use a CLR Function. Then you could apply a regexp or whatever to a column. Because they break portability and pose a security risk, many sysadmins don't allow CLR functions. (This includes Microsoft's SQL Azure product.)

like image 169
Robert Calhoun Avatar answered Sep 17 '22 15:09

Robert Calhoun