Why java String.length gives different result than python len() for the same string

Question

I have a string like the follwoing

("استنفار" OR "الأستنفار" OR "الاستنفار" OR "الإستنفار" OR "واستنفار" OR "باستنفار" OR "لستنفار" OR "فاستنفار" OR "والأستنفار" OR "بالأستنفار" OR "للأستنفار" OR "فالأستنفار" OR "والاستنفار" OR "بالاستنفار" OR "فالاستنفار" OR "والإستنفار" OR "بالإستنفار" OR "للإستنفار" OR "فالإستنفار" OR "إستنفار" OR "أستنفار" OR "إلأستنفار" OR "ألأستنفار" OR "إلاستنفار" OR "ألاستنفار" OR "إلإستنفار" OR "ألإستنفار") (("قوات سعودية" OR "قوات سعوديه" OR "القوات سعودية" OR "القوات سعوديه") OR ("القواتالسعودية" OR "القواتالسعوديه" OR "إلقواتالسعودية" OR "ألقواتالسعودية" OR "إلقواتالسعوديه" OR "ألقواتالسعوديه")("القوات السعودية" OR "إلقوات السعودية" OR "ألقوات السعودية" OR "والقوات السعودية" OR "بالقوات السعودية" OR "للقوات السعودية" OR "فالقوات السعودية" OR "وإلقوات السعودية" OR "بإلقوات السعودية" OR "لإلقوات السعودية" OR "فإلقوات السعودية" OR "وألقوات السعودية" OR "بألقوات السعودية" OR "لألقوات السعودية" OR "فألقوات السعودية") OR )

If I used java string variable and count the number of characters it gives me 923 but if I used the len function of python it gives me 1514

What is the difference here ?

falsetru · Accepted Answer

It seems like , in python (2.x), you count the byte length, not the character count.

Convert the byte string into unicode object using str.decode, then count the characters:

len(byte_string_object.decode('utf-8'))

You may also need to strip surround spaces:

len(byte_string_object.decode('utf-8').strip())

>>> len('استنفار')  # string (byte-string) literal
14
>>> len(u'استنفار')  # unicode literal
7
>>> len('استنفار'.decode('utf-8'))  # string -> unicode
7

Kristopher Wagner · Answer

It is because you are running python (2.x). In python (2.x) strings are bytes by default while in python (3.x) they are unicode by default and the same with java. For example if you open up the python3 interpreter, and type in

len("استنفار")

You will get 7 while if you type in the same line to the python2 interpreter you will get 14

len("استنفار")

You will get 7 while if you type in the same line to the python2 interpreter you will get 14

Why java String.length gives different result than python len() for the same string

Tags:

java

python

string

Fanooos

2 Answers

falsetru

Kristopher Wagner

Recent Activity

Donate For Us

Why java String.length gives different result than python len() for the same string

Tags:

java

python

string

Fanooos

2 Answers

falsetru

Kristopher Wagner

Related questions

Recent Activity

Donate For Us