I want to remove part of a string (shown in bold) below, this is stored in the string oldString [DMSM-8433] 加護亜依 Kago Ai – 加護亜依 vs. FRIDAY im using the following regex within python <pre class="prettyprint"><code>p=re.compile(ur"( [\W]+) (?=[A-Za-z ]+–)", re.UNICODE) newString=p.sub("", oldString) </code></pre> when i output the newString nothing has been removed

You can use the following snippet to solve the issue: <pre class="prettyprint"><code>#!/usr/bin/python # -*- coding: utf-8 -*- import re str = u'[DMSM-8433] 加護亜依 Kago Ai – 加護亜依 vs. FRIDAY' regex = u'[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]+ (?=[A-Za-z ]+–)' p = re.compile(regex, re.U) match = p.sub("", str) print match.encode("UTF-8") </code></pre> See IDEONE demo Beside <code># -*- coding: utf-8 -*-</code> declaration, I have added @nhahtdh's character class to detect Japanese symbols. Note that the <code>match</code> needs to be encoded as UTF-8 string "manually" since Python 2 needs to be "reminded" we are working with Unicode all the time.

regex python with unicode (japanese) character issue

Tags:

python

regex

unicode

I want to remove part of a string (shown in bold) below, this is stored in the string oldString

[DMSM-8433] 加護亜依 Kago Ai – 加護亜依 vs. FRIDAY

im using the following regex within python

Click to copy

p=re.compile(ur"( [\W]+) (?=[A-Za-z ]+–)", re.UNICODE)
newString=p.sub("", oldString)

when i output the newString nothing has been removed

212

asked Sep 30 '15 10:09

Paul Thomas

1 Answers

You can use the following snippet to solve the issue:

Click to copy

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
str = u'[DMSM-8433] 加護亜依 Kago Ai – 加護亜依 vs. FRIDAY'
regex = u'[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]+ (?=[A-Za-z ]+–)'
p = re.compile(regex, re.U)
match = p.sub("", str)
print match.encode("UTF-8")

See IDEONE demo

Beside # -*- coding: utf-8 -*- declaration, I have added @nhahtdh's character class to detect Japanese symbols.

Note that the match needs to be encoded as UTF-8 string "manually" since Python 2 needs to be "reminded" we are working with Unicode all the time.

141

answered Oct 09 '22 20:10

Wiktor Stribiżew

Related questions
                            
                                Tkinter generate and invoke virtual event between different widgets
                            
                                Flask behind gunicorn and supervisor - log all requests and responses
                            
                                How to use different Django Rest Framework serializers for the same request in Generic Views?
                            
                                How to use Values in a multiprocessing pool with Python
                            
                                Dynamically update plot in iPython notebook
                            
                                Can I play a sound through a non-default speaker with python on windows?
                            
                                How do I write a Class-Based Django Validator?
                            
                                Understanding treeReduce() in Spark
                            
                                Django - mysite.wsgi - file is not present but uWSGI works. How?
                            
                                TypeError: 'type' object is not iterable - Iterating over object instances
                            
                                pandas - boxplot median color settings issues
                            
                                Fast Iteration of numpy arrays
                            
                                Print only prints after functions are finished executing [duplicate]
                            
                                Generate 2d images of molecules from PubChem FTP data
                            
                                Determining the shape of result array after slicing in Numpy
                            
                                Python - Core Speed [duplicate]
                            
                                Python testing using mock library for user input in a loop
                            
                                How to choose port in socket programming?
                            
                                Repeatable id clashes on python objects [duplicate]
                            
                                Pandas python .describe() formatting/output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With