how to split a unicode string into list [duplicate]

Question

I have the following code:

stru = "۰۱۲۳۴۵۶۷۸۹"
strlist = stru.decode("utf-8").split()
print strlist[0]

my output is :

۰۱۲۳۴۵۶۷۸۹

But when i use:

print strlist[1]

I get the following traceback:

IndexError: list index out of range

My question is, how can I split my string? Of course, remember I get my string from a function, consider it's a variable?

chryss · Accepted Answer

The split() method by default splits on whitespace. Therefore, strlist is a list that contains the whole string in strlist[0], and one single element.

If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:

Function: list(stru.decode("utf-8"))
List comprension: [item for item in stru.decode("utf-8")]
Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (for character in stru.decode("utf-8"): ...)

Ignacio Vazquez-Abrams · Answer

You don't need to.

>>> print u"۰۱۲۳۴۵۶۷۸۹"[1]
۱

If you still want to...

>>> list(u"۰۱۲۳۴۵۶۷۸۹")
[u'\u06f0', u'\u06f1', u'\u06f2', u'\u06f3', u'\u06f4', u'\u06f5', u'\u06f6', u'\u06f7', u'\u06f8', u'\u06f9']

how to split a unicode string into list [duplicate]

Tags:

python

string

unicode

utf-8

unicode-string

PersianGulf

Video Answer

2 Answers

chryss

Ignacio Vazquez-Abrams

Recent Activity

Donate For Us

how to split a unicode string into list [duplicate]

Tags:

python

string

unicode

utf-8

unicode-string

PersianGulf

Video Answer

2 Answers

chryss

Ignacio Vazquez-Abrams

Related questions

Recent Activity

Donate For Us