Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string formatting: % vs concatenation

I'm developing an application in which I perform some requests to get an object id. After each one of them, I call a method (get_actor_info()) passing this id as an argument (see code below).

ACTOR_CACHE_KEY_PREFIX = 'actor_'

def get_actor_info(actor_id):
    cache_key = ACTOR_CACHE_KEY_PREFIX + str(actor_id)

As can be noticed, I'm casting actor_id to string and concatenating it with a prefix. However, I know I could do it in multiple other ways (.format() or '%s%d', for instance) and that results in my question: would '%s%d' be better than string concatenation in terms of readability, code convention and efficiency?

Thanks

like image 904
Amaury Medeiros Avatar asked Jan 05 '16 19:01

Amaury Medeiros


People also ask

What are the benefits of using the format method instead of string concatenation?

The main advantages of using format(…) are that the string can be a bit easier to produce and read as in particular in the second example, and that we don't have to explicitly convert all non-string variables to strings with str(…).

What is the best way to concatenate strings in Python?

One of the most popular methods to concatenate two strings in Python (or more) is using the + operator. The + operator, when used with two strings, concatenates the strings together to form one.

Is string formatting important in Python?

Strings are one of the most used and essential data types in Python. With that said, proper text formatting makes code and data much easier to read and understand.

What is the difference between string interpolation and string concatenation?

You can think of string concatenation as gluing strings together. And, you can think of string interpolation without strings as injecting strings inside of other strings.


1 Answers

I guess that, if all the terms to concatenate are constants, the concatenation with the + operator might be optimized by python for performance. Ex.:

DB_PREFIX = 'prod_'
INDEX_PREFIX = 'index_'

CRM_IDX_PREFIX = DB_PREFIX + INDEX_PREFIX + 'crm_'

But most of the cases the format function and operators are used to concatenate with variable content. E.g:

crm_index_name = "{}_{}".format(CRM_IDX_PREFIX, index_id)

In practical terms, if you use the + operator to concatenate like this:

crm_index_name = CRM_IDX_PREFIX + '_' + str(index_id)

you are defining the format by custom code in a fixed way. If you use a format string with named references the code is more readable. E.g:

crm_index_name = "{db_prefix}_{idx_prefix}_{mod_prefix}_{id}".format(
   db_prefix=CRM_IDX_PREFIX,
   idx_prefix=INDEX_PREFIX,
   mod_prefix='crm',
   id=index_id,
)

That way you have the advantage to define the format as a constant. E.g:

IDX_FORMAT = "{db_prefix}_{idx_prefix}_{mod_prefix}_{id}"

crm_index_name = IDX_FORMAT.format(
   db_prefix=CRM_IDX_PREFIX,
   idx_prefix=INDEX_PREFIX,
   mod_prefix='crm',
   id=index_id,
)

And this result more clear in case that you need to change the format in the future. For example, in order to change the order of the separators you only need change the format string to:

IDX_FORMAT = "{db_prefix}_{mod_prefix}_{idx_prefix}-{id}"

As a plus, in order to debug you can assign all those variables to a dictionary and pass it as keyword parameters to the format function:

idx_name_parts = {
   'db_prefix': CRM_IDX_PREFIX,
   'idx_prefix': INDEX_PREFIX,
   'mod_prefix': 'crm',
   'id': index_id,
}
crm_index_name = IDX_FORMAT.format(**idx_name_parts)

Taking advantage of the globals() function we can also:

IDX_FORMAT = "{CRM_IDX_PREFIX}_{mod_prefix}_{INDEX_PREFIX}-{index_id}"

crm_index_name = IDX_FORMAT.format(mod_prefix = 'crm', **globals())

That is similar to the python3's formatted string literal:

crm_index_name = f"{CRM_IDX_PREFIX}_crm_{INDEX_PREFIX}-{index_id}"

I also see Internationalization as another use context where formatted expressions are more useful that + operator. Take the following code:

message = "The account " + str(account_number) + " doesn't exist"

if you use a translation feature like the gettext module with the + operator it would be:

message = _("The account ") + str(account_number) + _(" doesn't exist")

so it is better to translate the whole format string:

message = _("The account {account_number} doesn't exist").format(account_number)

so that the complete message has more sense in the spanish translation file:

#: main.py:523
msgid "The account {account_number} doesn't exist"
msgstr "La cuenta {account_number} no existe."

That is specially helpful in translation to natural languages whose grammatic impose change in the order of the sentence, like german language.

like image 87
yucer Avatar answered Sep 22 '22 09:09

yucer