Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RoR character class regex

I have the following line of code in my Ruby on Rails app, which checks whether the given string contains Korean characters or not:

isKorean = !/\p{Hangul}/.match(word).nil?

It works perfectly in the console, but raises a syntax error for the actual app:

invalid character property name {Hangul}: /\p{Hangul}/

What am I missing and how can I get it to work?

like image 741
Arnold Avatar asked Mar 18 '12 04:03

Arnold


People also ask

What is a character class in regex?

In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.

How do I escape a character in regex?

The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.

What does \s mean in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.

What does \d do in regex?

Decimal digit character: \d \d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets. If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9].


1 Answers

This is a character encoding issue, you need to add:

# encoding: utf-8

to the top of the Ruby file you're using that regex in. You can probably use any encoding that the character class you're using exists in instead of UTF-8 if you wish. Note that in Ruby 2.0, UTF-8 is now the default, so this is not needed in Ruby 2.0+.

This is known as a "magic comment". You can and should read more about encoding in Ruby 1.9. Note that encoding in Rails views is handled automatically by config.encoding (set to UTF-8 by default in config/application.rb.

It was likely working in the console because your terminal is set to use UTF-8 already.

like image 77
Andrew Marshall Avatar answered Oct 16 '22 11:10

Andrew Marshall