Python sum of ASCII values of all characters in a string

Question

I am searching a more efficient way to sum-up the ASCII values of all characters in a given string, using only standard python (2.7 is preferable).

Currently I have:

print sum(ord(ch) for ch in text)

I want to emphasize that my main focus and aspect of this question is what I wrote above.

The following is somewhat less important aspect of this question and should be treated as such:

So why I am asking it?! I have compared this approach vs embedding a simple C-code function which does the same here using PyInline, and it seems that a simple C embedded function is 17 times faster.

If there is no Python approach faster than what I have suggested (using only standard Python), it seems strange that the Python developers haven't added such an implementation in the core.

Current results for suggested answers. On my Windows 7, i-7, Python 2.7:

 text = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 sum(ord(ch) for ch in text)
 >> 0.00521324663262
 sum(array.array("B", text))
 >> 0.0010040770317
 sum(map(ord, text ))
 >> 0.00427160369234
 sum(bytearray(text))
 >> 0.000864669402933

 C-code embedded:
 >> 0.000272828426841

joe · Accepted Answer

print sum(map(ord,my_string))

This would be the easiest.

user4815162342 · Answer

You can use an intermediate bytearray to speed things up:

>>> sum(bytearray("abcdefgh"))
804

This is not 17 times faster than the generator—it involves the creation of an intermediate bytearray and sum still has to iterate over Python integer objects—but on my machine it does speed up summing an 8-character string from 2μs to about 700ns. If a timing in this ballpark is still too inefficient for your use case, you should probably write the speed-critical parts of your application in C anyway.

If your strings are sufficiently large, and if you can use numpy, you can avoid creating temporary copies by directly referring to the string's buffer using numpy.frombuffer:

>>> import numpy as np
>>> np.frombuffer("abcdefgh", "uint8").sum()
804

For smaller strings this is slower than a temporary array because of the complexities in numpy's view creation machinery. However, for sufficiently large strings, the frombuffer approach starts to pay off, and it of course always creates less garbage. On my machine the cutoff point is string size of about 200 characters.

Also, see Guido's classic essay Python Optimization Anecdote. While some of its specific techniques may by now be obsolete, the general lesson of how to think about Python optimization is still quite relevant.

You can time the different approaches with the timeit module:

$ python -m timeit -s 's = "a" * 20' 'sum(ord(ch) for ch in s)' 
100000 loops, best of 3: 3.85 usec per loop
$ python -m timeit -s 's = "a" * 20' 'sum(bytearray(s))'
1000000 loops, best of 3: 1.05 usec per loop
$ python -m timeit -s 'from numpy import frombuffer; s = "a" * 20' \
                      'frombuffer(s, "uint8").sum()' 
100000 loops, best of 3: 4.8 usec per loop

Jon Clements · Answer

You can speed it up a bit (~40% ish, but nowhere near as fast as native C) by removing the creation of the generator...

Instead of:

sum(ord(c) for c in string)

Do:

sum(map(ord, string))

Timings:

>>> timeit.timeit(stmt="sum(map(ord, 'abcdefgh'))")
# TP: 1.5709713941578798
# JC: 1.425781011581421
>>> timeit.timeit(stmt="sum(ord(c) for c in 'abcdefgh')")
# TP: 1.7807035140629637
# JC: 1.9981679916381836

Python sum of ASCII values of all characters in a string

Tags:

python

string

ascii

python-2.7

Michael

3 Answers

joe

user4815162342

Jon Clements

Recent Activity

Donate For Us

Python sum of ASCII values of all characters in a string

Tags:

python

string

ascii

python-2.7

Michael

3 Answers

joe

user4815162342

Jon Clements

Related questions

Recent Activity

Donate For Us