Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string formatting: is '%' more efficient than 'format' function?

I wanted to compare different to build a string in Python from different variables:

  • using + to concatenate (referred to as 'plus')
  • using %
  • using "".join(list)
  • using format function
  • using "{0.<attribute>}".format(object)

I compared for 3 types of scenari

  • string with 2 variables
  • string with 4 variables
  • string with 4 variables, each used twice

I measured 1 million operations of each time and performed an average over 6 measures. I came up with the following timings:

Timings

In each scenario, I came up with the following conclusion

  • Concatenation seems to be one of the fastest method
  • Formatting using % is much faster than formatting with format function

I believe format is much better than % (e.g. in this question) and % was almost deprecated.

I have therefore several questions:

  1. Is % really faster than format?
  2. If so, why is that?
  3. Why is "{} {}".format(var1, var2) more efficient than "{0.attribute1} {0.attribute2}".format(object)?

For reference, I used the following code to measure the different timings.

import time def timing(f, n, show, *args):     if show: print f.__name__ + ":\t",     r = range(n/10)     t1 = time.clock()     for i in r:         f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args); f(*args)     t2 = time.clock()     timing = round(t2-t1, 3)     if show: print timing     return timing       class values(object):     def __init__(self, a, b, c="", d=""):         self.a = a         self.b = b         self.c = c         self.d = d       def test_plus(a, b):     return a + "-" + b  def test_percent(a, b):     return "%s-%s" % (a, b)  def test_join(a, b):     return ''.join([a, '-', b])          def test_format(a, b):     return "{}-{}".format(a, b)  def test_formatC(val):     return "{0.a}-{0.b}".format(val)       def test_plus_long(a, b, c, d):     return a + "-" + b + "-" + c + "-" + d  def test_percent_long(a, b, c, d):     return "%s-%s-%s-%s" % (a, b, c, d)          def test_join_long(a, b, c, d):     return ''.join([a, '-', b, '-', c, '-', d])      def test_format_long(a, b, c, d):     return "{0}-{1}-{2}-{3}".format(a, b, c, d)  def test_formatC_long(val):     return "{0.a}-{0.b}-{0.c}-{0.d}".format(val)       def test_plus_long2(a, b, c, d):     return a + "-" + b + "-" + c + "-" + d + "-" + a + "-" + b + "-" + c + "-" + d  def test_percent_long2(a, b, c, d):     return "%s-%s-%s-%s-%s-%s-%s-%s" % (a, b, c, d, a, b, c, d)      def test_join_long2(a, b, c, d):     return ''.join([a, '-', b, '-', c, '-', d, '-', a, '-', b, '-', c, '-', d])              def test_format_long2(a, b, c, d):     return "{0}-{1}-{2}-{3}-{0}-{1}-{2}-{3}".format(a, b, c, d)  def test_formatC_long2(val):     return "{0.a}-{0.b}-{0.c}-{0.d}-{0.a}-{0.b}-{0.c}-{0.d}".format(val)   def test_plus_superlong(lst):     string = ""     for i in lst:         string += str(i)     return string       def test_join_superlong(lst):     return "".join([str(i) for i in lst])       def mean(numbers):     return float(sum(numbers)) / max(len(numbers), 1)           nb_times = int(1e6) n = xrange(5) lst_numbers = xrange(1000) from collections import defaultdict metrics = defaultdict(list) list_functions = [     test_plus, test_percent, test_join, test_format, test_formatC,     test_plus_long, test_percent_long, test_join_long, test_format_long, test_formatC_long,     test_plus_long2, test_percent_long2, test_join_long2, test_format_long2, test_formatC_long2,     # test_plus_superlong, test_join_superlong, ] val = values("123", "456", "789", "0ab") for i in n:     for f in list_functions:         print ".",         name = f.__name__         if "formatC" in name:             t = timing(f, nb_times, False, val)         elif '_long' in name:             t = timing(f, nb_times, False, "123", "456", "789", "0ab")         elif '_superlong' in name:             t = timing(f, nb_times, False, lst_numbers)         else:             t = timing(f, nb_times, False, "123", "456")         metrics[name].append(t)   # Get Average print "\n===AVERAGE OF TIMINGS===" for f in list_functions:     name = f.__name__     timings = metrics[name]     print "{:>20}:\t{:0.5f}".format(name, mean(timings)) 
like image 339
Jean-Francois T. Avatar asked Nov 21 '16 07:11

Jean-Francois T.


People also ask

Is F string better than format?

As of Python 3.6, f-strings are a great new way to format strings. Not only are they more readable, more concise, and less prone to error than other ways of formatting, but they are also faster!

What is the benefit of Python format function?

Python's str. format() technique of the string category permits you to try and do variable substitutions and data formatting. This enables you to concatenate parts of a string at desired intervals through point data format.

Is string formatting important in Python?

Python string class gives us an important built-in command called format() that helps us to replace, substitute, or convert the string with placeholders with valid values in the final string.

What is the use of '%' in Python?

The % symbol in Python is called the Modulo Operator. It returns the remainder of dividing the left hand operand by right hand operand. It's used to get the remainder of a division problem.


1 Answers

  1. Yes, % string formatting is faster than the .format method
  2. most likely (this may have a much better explanation) due to % being a syntactical notation (hence fast execution), whereas .format involves at least one extra method call
  3. because attribute value access also involves an extra method call, viz. __getattr__

I ran a slightly better analysis (on Python 3.8.2) using timeit of various formatting methods, results of which are as follows (pretty-printed with BeautifulTable) -

 +-----------------+-------+-------+-------+-------+-------+--------+ | Type \ num_vars |   1   |   2   |   5   |  10   |  50   |  250   | +-----------------+-------+-------+-------+-------+-------+--------+ |    f_str_str    | 0.056 | 0.063 | 0.115 | 0.173 | 0.754 | 3.717  | +-----------------+-------+-------+-------+-------+-------+--------+ |    f_str_int    | 0.055 | 0.148 | 0.354 | 0.656 | 3.186 | 15.747 | +-----------------+-------+-------+-------+-------+-------+--------+ |   concat_str    | 0.012 | 0.044 | 0.169 | 0.333 | 1.888 | 10.231 | +-----------------+-------+-------+-------+-------+-------+--------+ |    pct_s_str    | 0.091 | 0.114 | 0.182 | 0.313 | 1.213 | 6.019  | +-----------------+-------+-------+-------+-------+-------+--------+ |    pct_s_int    | 0.09  | 0.141 | 0.248 | 0.479 | 2.179 | 10.768 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format_str  | 0.143 | 0.157 | 0.251 | 0.461 | 1.745 | 8.259  | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format_int  | 0.141 | 0.192 | 0.333 | 0.62  | 2.735 | 13.298 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format2_str | 0.159 | 0.195 | 0.33  | 0.634 | 3.494 | 18.975 | +-----------------+-------+-------+-------+-------+-------+--------+ | dot_format2_int | 0.158 | 0.227 | 0.422 | 0.762 | 4.337 | 25.498 | +-----------------+-------+-------+-------+-------+-------+--------+ 

The trailing _str & _int represent the operation was carried out on respective value types.

Kindly note that the concat_str result for a single variable is essentially just the string itself, so it shouldn't really be considered.

My setup for arriving at the results -

from timeit import timeit from beautifultable import BeautifulTable  # pip install beautifultable  times = {}  for num_vars in (250, 50, 10, 5, 2, 1):     f_str = "f'{" + '}{'.join([f'x{i}' for i in range(num_vars)]) + "}'"     # "f'{x0}{x1}'"     concat = '+'.join([f'x{i}' for i in range(num_vars)])     # 'x0+x1'     pct_s = '"' + '%s'*num_vars + '" % (' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'     # '"%s%s" % (x0,x1)'     dot_format = '"' + '{}'*num_vars + '".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'     # '"{}{}".format(x0,x1)'     dot_format2 = '"{' + '}{'.join([f'{i}' for i in range(num_vars)]) + '}".format(' + ','.join([f'x{i}' for i in range(num_vars)]) + ')'     # '"{0}{1}".format(x0,x1)'      vars = ','.join([f'x{i}' for i in range(num_vars)])     vals_str = tuple(map(str, range(num_vars))) if num_vars > 1 else '0'     setup_str = f'{vars} = {vals_str}'     # "x0,x1 = ('0', '1')"     vals_int = tuple(range(num_vars)) if num_vars > 1 else 0     setup_int = f'{vars} = {vals_int}'     # 'x0,x1 = (0, 1)'      times[num_vars] = {         'f_str_str': timeit(f_str, setup_str),         'f_str_int': timeit(f_str, setup_int),         'concat_str': timeit(concat, setup_str),         # 'concat_int': timeit(concat, setup_int), # this will be summation, not concat         'pct_s_str': timeit(pct_s, setup_str),         'pct_s_int': timeit(pct_s, setup_int),         'dot_format_str': timeit(dot_format, setup_str),         'dot_format_int': timeit(dot_format, setup_int),         'dot_format2_str': timeit(dot_format2, setup_str),         'dot_format2_int': timeit(dot_format2, setup_int),     }  table = BeautifulTable() table.column_headers = ['Type \ num_vars'] + list(map(str, times.keys())) # Order is preserved, so I didn't worry much for key in ('f_str_str', 'f_str_int', 'concat_str', 'pct_s_str', 'pct_s_int', 'dot_format_str', 'dot_format_int', 'dot_format2_str', 'dot_format2_int'):     table.append_row([key] + [times[num_vars][key] for num_vars in (1, 2, 5, 10, 50, 250)]) print(table) 

I couldn't go beyond num_vars=250 because of the max arguments (255) limit with timeit.

tl;dr - Python string formatting performance : f-strings are fastest and more elegant, but at times (due to some implementation restrictions & being Py3.6+ only), you might have to use other formatting options as necessary.

like image 181
shad0w_wa1k3r Avatar answered Sep 30 '22 03:09

shad0w_wa1k3r