Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How can I replace full-width characters with half-width characters?

If this was PHP, I would probably do something like this:

function no_more_half_widths($string){
  $foo = array('1','2','3','4','5','6','7','8','9','10')
  $bar = array('1','2','3','4','5','6','7','8','9','10')
  return str_replace($foo, $bar, $string)
}

I have tried the .translate function in python and it indicates that the arrays are not of the same size. I assume this is due to the fact that the individual characters are encoded in utf-8. Any suggestions?

like image 604
Darren White McGee Avatar asked Mar 11 '10 02:03

Darren White McGee


People also ask

How do you write alphanumeric and half-width?

Enter the alphanumeric text, and then press the F9 key to convert to Full-width Alphanumeric characters or the F10 key to convert to Half-width Alphanumeric characters.

What is half-width alphabetic characters?

Half-width refers to characters where the horizontal and vertical length ratio is 1:2. These characters are horizontally narrow. English letters, numbers, spaces, and punctuation marks such as comma and period are half-width by default.

How do you write Fullwidth?

For Windows: Use F10 within an online form to toggle quickly between full-width and half-width characters. For Mac users: Full-width, zenkaku katakana, is control + k.

What is full-width alphanumeric?

Adjective. fullwidth (not comparable) (computing, typography) Of a text character, occupying the space of two alphanumeric characters in a monospace font, or two "normal" text columns.


1 Answers

The built-in unicodedata module can do it:

>>> import unicodedata
>>> foo = u'1234567890'
>>> unicodedata.normalize('NFKC', foo)
u'1234567890'

The “NFKC” stands for “Normalization Form KC [Compatibility Decomposition, followed by Canonical Composition]”, and replaces full-width characters by half-width ones, which are Unicode equivalent.

Note that it also normalizes all sorts of other things at the same time, like separate accent marks and Roman numeral symbols.

like image 65
Daniel Newby Avatar answered Oct 05 '22 11:10

Daniel Newby