Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix prettytable to display chinese character properly

from prettytable import PrettyTable

header="乘客姓名,性别,出生日期".split(",")
x = PrettyTable(header)
x.align["乘客姓名"]="l"
table='''HuangTianhui,男,1948/05/28
姜翠云,女,1952/03/27
李红晶,女,1994/12/09
LuiChing,女,1969/08/02
宋飞飞,男,1982/03/01
唐旭东,男,1983/08/03
YangJiabao,女,1988/08/25
买买提江·阿布拉,男,1979/07/10
安文兰,女,1949/10/20
胡偲婠(婴儿),女,2011/02/25
(有待确定姓名),男,1985/07/20
'''
data=[row for row in table.split("\n") if row]
for row in data:
    x.add_row(row.strip().split(","))

print(x)

enter image description here

What I want the output format is as the following.

enter image description here

In this example, prettytable.py can not display properly chinese ambiguous width of character · in 买买提江·阿布拉 , the character has ambiguous width. How to fix the bug in prettytable.py?

I have add two lines in def _char_block_width(char) of prettytable.py, but the problem still remains.

if char == 0xb7:
    return 2 

I have solved it, the file prettytable.py should be installed in my computer d:\python33\Lib\site-packagesdirectly not in as the form of d:\python33\Lib\site-packages\prettytable\prettytable.py

There are many chinese character with ambiguous width, it is stupid for us to add two lines such as the following to fix the bug, if there are 50 ambiguous character,100 lines will be added in the prettytable.py, is there a simple way to do that? Just fix some lines to treat all the ambiguous character?

if char == 0xb7:
    return 2 
like image 274
showkey Avatar asked Apr 12 '14 07:04

showkey


People also ask

How to fix Chinese characters not displaying on websites?

Let's discuss this easy fix, before moving on to more complicated problems. If a Chinese website is still not correctly displaying Chinese characters, you can usually fix this by manually adjusting your browser's character encoding setting. (If you're using Windows XP, make sure you have enabled East Asian languages.

Why are Chinese characters not being decoded properly?

Chinese characters may not be decoded properly due to the settings of the language for non-Unicode software. For us to rectify this, we advise checking and changing the system locale by following these steps: Click Control Panel. Click Clock, Language, and Region. Under Region, click Change location.

Why is the Chinese not displaying correctly in Windows 10?

If some or all of the Chinese is still not displaying correctly, the next setting to experiement with is in the "Location" tab of the same control panel. Don't do this unless you must, because many of your other applications and services may start showing you China defaults in places you may not expect.

Why won't some of my Chinese filenames display correctly?

If you are trying to run Chinese applications created for use only in China, or if some of your Chinese filenames won't display correctly on your system, often your only choice is to change the locale for your entire system. Some software is so tied to China that you will also have to change the location. Those are two different settings.


1 Answers

The issue you're running into has to do with the dot character in the incorrectly padded line of your Python output. The dot is Unicode code point U+00B7 · middle dot. This character is considered to have an "ambiguous" width, as it is a narrow character in most non-East-Asian fonts, but is rendered a full-width in most Asian ones. Without context, a program can't tell how wide it will appear on the screen. Unfortunately, Python's Unicode system doesn't appear to have any way to provide that context.

One fix might be to replace the offending dot with one that has an unambiguous width, such as U+30FB katakana middle dot (which is always full width). This way the padding logic will be able to recognize that extra space is needed for that line.

Another solution could be to set your console to use a font with more Western treatment of the middle dot character, rather than the current one that follows the East-Asian style of rendering of it as full-width. This will mean that the existing padding is correct. Your output from R clearly uses a different font that the Python output does, and its font renders the dot as half-width.

like image 170
Blckknght Avatar answered Oct 20 '22 01:10

Blckknght