Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python TypeError: expected a character buffer object, personal misunderstanding

Tags:

python

unicode

i was stuck at this error during a long time :

 TypeError: expected a character buffer object

i just understand what i has misunderstood, it is something about a difference between an unicode string and a 'simple' string, i have tried to use the above code with a "normal" string, while i had to pass a unicode one. So forgeting the simple "u" before the string was breaking execution :/ !!!

BTW the TypeError was very unclear to me, and is still.

please, can some explain me what i was missing and why a "simple" string is NOT a "a character buffer object" ?

you can reproduce with the code below (extracted and (c) from here: )

def maketransU(s1, s2, todel=u""):
    """Build translation table for use with unicode.translate().

    :param s1: string of characters to replace.
    :type s1: unicode
    :param s2: string of replacement characters (same order as in s1).
    :type s2: unicode
    :param todel: string of characters to remove.
    :type todel: unicode
    :return: translation table with character code -> character code.
    :rtype: dict
    """
    # We go unicode internally - ensure callers are ok with that.
    assert (isinstance(s1,unicode))
    assert (isinstance(s2,unicode))
    trans_tab = dict( zip( map(ord, s1), map(ord, s2) ) )
    trans_tab.update( (ord(c),None) for c in todel )
    return trans_tab

#BlankToSpace_table = string.maketrans (u"\r\n\t\v\f",u"     ")
BlankToSpace_table = maketransU (u"\r\n\t\v\f",u"     ")
def BlankToSpace(text) :
    """Replace blanks characters by realspaces.

    May be good to prepare for regular expressions & Co based on whitespaces.

    :param  text: the text to clean from blanks.
    :type  text: string
    :return: List of parts in their apparition order.
    :rtype: [ string ]
    """
    print text, type(text), len(text)
    try:
        out =  text.translate(BlankToSpace_table)
    except TypeError, e:
        raise
    return out

# for SO : the code below is just to reproduce what i did not understand
dummy = "Hello,\n, this is a \t dummy test!"
for s in (unicode(dummy), dummy):
    print repr(s)
    print repr(BlankToSpace(s))

producing :

u'Hello,\n, this is a \t dummy test!'
Hello,
, this is a      dummy test! <type 'unicode'> 32
u'Hello, , this is a   dummy test!'
'Hello,\n, this is a \t dummy test!'
Hello,
, this is a      dummy test! <type 'str'> 32

Traceback (most recent call last):
  File "C:/treetaggerwrapper.error.py", line 44, in <module>
    print repr(BlankToSpace(s))
  File "C:/treetaggerwrapper.error.py", line 36, in BlankToSpace
    out =  text.translate(BlankToSpace_table)
TypeError: expected a character buffer object
like image 441
user1340802 Avatar asked Dec 16 '22 00:12

user1340802


1 Answers

The issue is that the translate method of a bytestring is different from the translate method of a unicode string. Here's the docstring of the non-unicode version:

S.translate(table [,deletechars]) -> string

Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256.

And here's the unicode version:

S.translate(table) -> unicode

Return a copy of the string S, where all characters have been mapped through the given translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.

You can see that the non-unicode version is expecting "a string of length 256", whereas the non-unicode version is expecting a "mapping" (ie a dict). So the problem is not that your unicode string is a buffer object and the non-unicode one isn't - of course, both are buffers - but that one translate method is expecting such a buffer object and the other isn't.

like image 70
Daniel Roseman Avatar answered Apr 07 '23 18:04

Daniel Roseman