Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you use unicode characters within a regular expression in Ruby?

I am attempting to write a line of code that will take a line of japanese text and delete a certain set of characters. However I am having trouble with using unicode characters inside of the regular expression.

I am currently using text.gsub(/《.*?》/u, '') but I get the error

'gsub': invalid byte sequence in Windows-31J (Argument error)

Can anyone tell me what I am doing incorrectly?

Example text : その仕草《しぐさ》があまりに無造作《むぞうさ》だったので

Expected result: その仕草があまりに無造作だったので

Thanks

edit: # encoding: utf-8 is present at the top of the script.

like image 542
SomberClock Avatar asked Feb 06 '26 18:02

SomberClock


1 Answers

Try this:

text.encode('utf-8', 'utf-8').gsub(/《.*?》/u, '')
like image 103
Limbo Peng Avatar answered Feb 09 '26 07:02

Limbo Peng



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!