The <code>format</code> function in builtins seems to be like a subset of the <code>str.format</code> method used specifically for the case of a formatting a single object. eg. <pre class="prettyprint"><code>>>> format(13, 'x') 'd' </code></pre> is apparently preferred over <pre class="prettyprint"><code>>>> '{0:x}'.format(13) 'd' </code></pre> and IMO it does look nicer, but why not just use <code>str.format</code> in every case to make things simpler? Both of these were introduced in <code>2.6</code> so there must be a good reason for having both at once, what is it? Edit: I was asking about <code>str.format</code> and <code>format</code>, not why we don't have a <code>(13).format</code>

tldr; <code>format</code> just calls <code>obj.__format__</code> and is used by the <code>str.format</code> method which does even more higher level stuff. For the lower level it makes sense to teach an object how to format itself. <h3>It is just syntactic sugar</h3> The fact that this function shares the name and format specification with <code>str.format</code> can be misleading. The existence of <code>str.format</code> is easy to explain: it does complex string interpolation (replacing the old <code>%</code> operator); <code>format</code> can format a single object as string, the smallest subset of <code>str.format</code> specification. So, why do we need <code>format</code>? The <code>format</code> function is an alternative to the <code>obj.format('fmt')</code> construct found in some OO languages. This decision is consistent with the rationale for <code>len</code> (on why Python uses a function <code>len(x)</code> instead of a property <code>x.length</code> like Javascript or Ruby). When a language adopts the <code>obj.format('fmt')</code> construct (or <code>obj.length</code>, <code>obj.toString</code> and so on), classes are prevented from having an attribute called <code>format</code> (or <code>length</code>, <code>toString</code>, you got the idea) - otherwise it would shadow the standard method from the language. In this case, the language designers are placing the burden of preventing name clashes on the programmer. Python is very fond of the PoLA and adopted the <code>__dunder__</code> (double underscores) convention for built-ins in order to minimize the chance of conflicts between user-defined attributes and the language built-ins. So <code>obj.format('fmt')</code> becomes <code>obj.__format__('fmt')</code>, and of course you can call <code>obj.__format__('fmt')</code> instead of <code>format(obj, 'fmt')</code> (the same way you can call <code>obj.__len__()</code> instead of <code>len(obj)</code>). Using your example: <pre class="prettyprint"><code>>>> '{0:x}'.format(13) 'd' >>> (13).__format__('x') 'd' >>> format(13, 'x') 'd' </code></pre> Which one is cleaner and easier to type? Python design is very pragmatic, it is not only cleaner but is well aligned with the Python's duck-typed approach to OO and gives the language designers freedom to change/extend the underlying implementation without breaking legacy code. The PEP 3101 introduced the new <code>str.format</code> method and <code>format</code> built-in without any comment on the rationale for the <code>format</code> function, but the implementation is obviously just syntactic sugar: <pre class="prettyprint"><code>def format(value, format_spec): return value.__format__(format_spec) </code></pre> And here I rest my case. <h3>What Guido said about it (or is it official?)</h3> Quoting the very BDFL about <code>len</code>: <blockquote> First of all, I chose <code>len(x)</code> over <code>x.len()</code> for HCI reasons (<code>def __len__()</code> came much later). There are two intertwined reasons actually, both HCI: (a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like <code>x*(a+b)</code> into <code>x*a + x*b</code> to the clumsiness of doing the same thing using a raw OO notation. (b) When I read code that says <code>len(x)</code> I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read <code>x.len()</code>, I have to already know that <code>x</code> is some kind of container implementing an interface or inheriting from a class that has a standard <code>len()</code>. Witness the confusion we occasionally have when a class that is not implementing a mapping has a <code>get()</code> or <code>keys()</code> method, or something that isn’t a file has a <code>write()</code> method. Saying the same thing in another way, I see ‘<code>len</code>‘ as a built-in operation. I’d hate to lose that. /…/ </blockquote> source: pyfaq@effbot.org (original post here has also the original question Guido was answering). Abarnert suggests also: <blockquote> There's additional reasoning about len in the Design and History FAQ. Although it's not as complete or as good of an answer, it is indisputably official. – abarnert </blockquote> <h3>Is this a practical concern or just syntax nitpicking?</h3> This is a very practical and real-world concern in languages like Python, Ruby or Javascript because in dynamically typed languages any mutable object is effectively a namespace, and the concept of private methods or attributes is a matter of convention. Possibly I could not put it better than abarnert in his comment: <blockquote> Also, as far as the namespace-pollution issue with Ruby and JS, it's worth pointing out that this is an inherent problem with dynamically-typed languages. In statically-typed languages as diverse as Haskell and C++, type-specific free functions are not only possible, but idiomatic. (See The Interface Principle.) But in dynamically-typed languages like Ruby, JS, and Python, free functions must be universal. A big part of language/library design for dynamic languages is picking the right set of such functions. </blockquote> For example, I just left Ember.js in favor of Angular.js because I was tired of namespace conflicts in Ember; Angular handles this using an elegant Python-like strategy of prefixing built-in methods (with <code>$thing</code> in Angular, instead of underscores like python), so they do not conflict with user-defined methods and properties. Yes, the whole <code>__thing__</code> is not particularly pretty but I'm glad Python took this approach because it is very explicit and avoid the PoLA class of bugs regarding object namespace clashes.

Why does Python have a format function as well as a format method

Tags:

python

string

format

python-2.6

built-in

The format function in builtins seems to be like a subset of the str.format method used specifically for the case of a formatting a single object.

eg.

>>> format(13, 'x') 'd'

is apparently preferred over

>>> '{0:x}'.format(13) 'd'

and IMO it does look nicer, but why not just use str.format in every case to make things simpler? Both of these were introduced in 2.6 so there must be a good reason for having both at once, what is it?

Edit: I was asking about str.format and format, not why we don't have a (13).format

790

asked May 22 '13 04:05

jamylak

1 Answers

tldr; format just calls obj.__format__ and is used by the str.format method which does even more higher level stuff. For the lower level it makes sense to teach an object how to format itself.

It is just syntactic sugar

The fact that this function shares the name and format specification with str.format can be misleading. The existence of str.format is easy to explain: it does complex string interpolation (replacing the old % operator); format can format a single object as string, the smallest subset of str.format specification. So, why do we need format?

The format function is an alternative to the obj.format('fmt') construct found in some OO languages. This decision is consistent with the rationale for len (on why Python uses a function len(x) instead of a property x.length like Javascript or Ruby).

When a language adopts the obj.format('fmt') construct (or obj.length, obj.toString and so on), classes are prevented from having an attribute called format (or length, toString, you got the idea) - otherwise it would shadow the standard method from the language. In this case, the language designers are placing the burden of preventing name clashes on the programmer.

Python is very fond of the PoLA and adopted the __dunder__ (double underscores) convention for built-ins in order to minimize the chance of conflicts between user-defined attributes and the language built-ins. So obj.format('fmt') becomes obj.__format__('fmt'), and of course you can call obj.__format__('fmt') instead of format(obj, 'fmt') (the same way you can call obj.__len__() instead of len(obj)).

Using your example:

>>> '{0:x}'.format(13) 'd' >>> (13).__format__('x') 'd' >>> format(13, 'x') 'd'

Which one is cleaner and easier to type? Python design is very pragmatic, it is not only cleaner but is well aligned with the Python's duck-typed approach to OO and gives the language designers freedom to change/extend the underlying implementation without breaking legacy code.

The PEP 3101 introduced the new str.format method and format built-in without any comment on the rationale for the format function, but the implementation is obviously just syntactic sugar:

def format(value, format_spec):     return value.__format__(format_spec)

And here I rest my case.

What Guido said about it (or is it official?)

Quoting the very BDFL about len:

First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:

(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.

(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.

Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/

source: [email protected] (original post here has also the original question Guido was answering). Abarnert suggests also:

There's additional reasoning about len in the Design and History FAQ. Although it's not as complete or as good of an answer, it is indisputably official. – abarnert

Is this a practical concern or just syntax nitpicking?

This is a very practical and real-world concern in languages like Python, Ruby or Javascript because in dynamically typed languages any mutable object is effectively a namespace, and the concept of private methods or attributes is a matter of convention. Possibly I could not put it better than abarnert in his comment:

Also, as far as the namespace-pollution issue with Ruby and JS, it's worth pointing out that this is an inherent problem with dynamically-typed languages. In statically-typed languages as diverse as Haskell and C++, type-specific free functions are not only possible, but idiomatic. (See The Interface Principle.) But in dynamically-typed languages like Ruby, JS, and Python, free functions must be universal. A big part of language/library design for dynamic languages is picking the right set of such functions.

For example, I just left Ember.js in favor of Angular.js because I was tired of namespace conflicts in Ember; Angular handles this using an elegant Python-like strategy of prefixing built-in methods (with $thing in Angular, instead of underscores like python), so they do not conflict with user-defined methods and properties. Yes, the whole __thing__ is not particularly pretty but I'm glad Python took this approach because it is very explicit and avoid the PoLA class of bugs regarding object namespace clashes.

134

answered Oct 02 '22 14:10

29 revs, 2 users 97%

Related questions
                            
                                Can I specify a numpy dtype when generating random values?
                            
                                Spark add new column to dataframe with value from previous row
                            
                                How do pandas Rolling objects work?
                            
                                Python - install script to system
                            
                                Python Dictionary to CSV
                            
                                Django Overriding Model Clean() vs Save()
                            
                                Replicating rows in a pandas data frame by a column value
                            
                                PEP 0492 - Python 3.5 async keyword
                            
                                Python type hinting: how to tell X is a subclass for Foo?
                            
                                Python Code Obfuscation [closed]
                            
                                Python creating a shared variable between threads
                            
                                Is "from matplotlib import pyplot as plt" == "import matplotlib.pyplot as plt"?
                            
                                Relational/Logic Programming in Python?
                            
                                Ctrl-C crashes Python after importing scipy.stats
                            
                                Changing iteration variable inside for loop in Python [duplicate]
                            
                                python pass different **kwargs to multiple functions
                            
                                Tensorflow: How to replace a node in a calculation graph?
                            
                                Pandas groupby with categories with redundant nan
                            
                                Shading an area between two points in a matplotlib plot
                            
                                login() in Django testing framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With