i was stuck at this error during a long time :
TypeError: expected a character buffer object
i just understand what i has misunderstood, it is something about a difference between an unicode string and a 'simple' string, i have tried to use the above code with a "normal" string, while i had to pass a unicode one. So forgeting the simple "u" before the string was breaking execution :/ !!!
BTW the TypeError was very unclear to me, and is still.
please, can some explain me what i was missing and why a "simple" string is NOT a "a character buffer object" ?
you can reproduce with the code below (extracted and (c) from here: )
def maketransU(s1, s2, todel=u""):
"""Build translation table for use with unicode.translate().
:param s1: string of characters to replace.
:type s1: unicode
:param s2: string of replacement characters (same order as in s1).
:type s2: unicode
:param todel: string of characters to remove.
:type todel: unicode
:return: translation table with character code -> character code.
:rtype: dict
"""
# We go unicode internally - ensure callers are ok with that.
assert (isinstance(s1,unicode))
assert (isinstance(s2,unicode))
trans_tab = dict( zip( map(ord, s1), map(ord, s2) ) )
trans_tab.update( (ord(c),None) for c in todel )
return trans_tab
#BlankToSpace_table = string.maketrans (u"\r\n\t\v\f",u" ")
BlankToSpace_table = maketransU (u"\r\n\t\v\f",u" ")
def BlankToSpace(text) :
"""Replace blanks characters by realspaces.
May be good to prepare for regular expressions & Co based on whitespaces.
:param text: the text to clean from blanks.
:type text: string
:return: List of parts in their apparition order.
:rtype: [ string ]
"""
print text, type(text), len(text)
try:
out = text.translate(BlankToSpace_table)
except TypeError, e:
raise
return out
# for SO : the code below is just to reproduce what i did not understand
dummy = "Hello,\n, this is a \t dummy test!"
for s in (unicode(dummy), dummy):
print repr(s)
print repr(BlankToSpace(s))
producing :
u'Hello,\n, this is a \t dummy test!'
Hello,
, this is a dummy test! <type 'unicode'> 32
u'Hello, , this is a dummy test!'
'Hello,\n, this is a \t dummy test!'
Hello,
, this is a dummy test! <type 'str'> 32
Traceback (most recent call last):
File "C:/treetaggerwrapper.error.py", line 44, in <module>
print repr(BlankToSpace(s))
File "C:/treetaggerwrapper.error.py", line 36, in BlankToSpace
out = text.translate(BlankToSpace_table)
TypeError: expected a character buffer object
The issue is that the translate
method of a bytestring is different from the translate
method of a unicode string. Here's the docstring of the non-unicode version:
S.translate(table [,deletechars]) -> string
Return a copy of the string S, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a string of length 256.
And here's the unicode version:
S.translate(table) -> unicode
Return a copy of the string S, where all characters have been mapped through the given translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.
You can see that the non-unicode version is expecting "a string of length 256", whereas the non-unicode version is expecting a "mapping" (ie a dict). So the problem is not that your unicode string is a buffer object and the non-unicode one isn't - of course, both are buffers - but that one translate
method is expecting such a buffer object and the other isn't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With