How do you use unicode characters within a regular expression in Ruby?

Question

I am attempting to write a line of code that will take a line of japanese text and delete a certain set of characters. However I am having trouble with using unicode characters inside of the regular expression.

I am currently using text.gsub(/《.*?》/u, '') but I get the error

'gsub': invalid byte sequence in Windows-31J (Argument error)

Can anyone tell me what I am doing incorrectly?

Example text : その仕草《しぐさ》があまりに無造作《むぞうさ》だったので

Expected result: その仕草があまりに無造作だったので

Thanks

edit: # encoding: utf-8 is present at the top of the script.

Limbo Peng · Accepted Answer

Try this:

text.encode('utf-8', 'utf-8').gsub(/《.*?》/u, '')

How do you use unicode characters within a regular expression in Ruby?

Tags:

regex

ruby

unicode

SomberClock

1 Answers

Limbo Peng

Recent Activity

Donate For Us

How do you use unicode characters within a regular expression in Ruby?

Tags:

regex

ruby

unicode

SomberClock

1 Answers

Limbo Peng

Related questions

Recent Activity

Donate For Us