Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formatting thousand separator for integers in a pandas dataframe

Tags:

python

pandas

I'm trying to use '{:,}'.format(number) like the example below to format a number in a pandas dataframe:

# This works for floats and integers
print '{:,}'.format(20000)
# 20,000
print '{:,}'.format(20000.0)
# 20,000.0

The problem is that with a dataframe that has integers does not work, and in a dataframe with float works ok. See the examples:

# Does not work. The format stays the same, does not show thousands separator
df_int = DataFrame({"A": [20000, 10000]})
print df_int.to_html(float_format=lambda x: '{:,}'.format(x))

# Example of result
# <tr>
#   <th>0</th>
#   <td> 20000</td>
# </tr

# Works OK
df_float = DataFrame({"A": [20000.0, 10000.0]})
print df_float.to_html(float_format=lambda x: '{:,}'.format(x))

# Example of result
# <tr>
#   <th>0</th>
#   <td>20,000.0</td>
# </tr>

What i'm doing wrong?

like image 516
Javier Cárdenas Avatar asked Jul 23 '14 23:07

Javier Cárdenas


People also ask

How do you add thousands separator in Pandas?

We use the python string format syntax '{:,. 0f}'. format to add the thousand comma separators to the numbers. Then we use python's map() function to iterate and apply the formatting to all the rows in the 'Median Sales Price' column.

How do you add a thousands separator in Python?

Using the modern f-strings is, in my opinion, the most Pythonic solution to add commas as thousand-separators for all Python versions above 3.6: f'{1000000:,}' . The inner part within the curly brackets :, says to format the number and use commas as thousand separators.

How Split comma Separated Values in Pandas DataFrame?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.


2 Answers

pandas (as of 0.20.1) does not allow overriding the default integer format in an easy way. It is hard coded in pandas.io.formats.format.IntArrayFormatter (the labmda function):

class IntArrayFormatter(GenericArrayFormatter):

    def _format_strings(self):
        formatter = self.formatter or (lambda x: '% d' % x)
        fmt_values = [formatter(x) for x in self.values]
        return fmt_values

I'm assuming is what you're actually asking for is how you can override the format for all integers: replace ("monkey patch") the IntArrayFormatter to print integer values with thousands separated by comma as follows:

import pandas

class _IntArrayFormatter(pandas.io.formats.format.GenericArrayFormatter):

    def _format_strings(self):
        formatter = self.formatter or (lambda x: ' {:,}'.format(x))
        fmt_values = [formatter(x) for x in self.values]
        return fmt_values

pandas.io.formats.format.IntArrayFormatter = _IntArrayFormatter

Note:

  • before 0.20.0, the formatters were in pandas.formats.format.
  • before 0.18.1, the formatters were in pandas.core.format.

Aside

For floats you do not need to jump through those hoops since there is a configuration option for it:

display.float_format: The callable should accept a floating point number and return a string with the desired format of the number. This is used in some places like SeriesFormatter. See core.format.EngFormatter for an example.

like image 107
kynan Avatar answered Oct 12 '22 01:10

kynan


The formatters parameter in to_html will take a dictionary of column names mapped to a formatting function. Below has an example of a function to build a dict that maps the same function to both floats and ints.

In [250]: num_format = lambda x: '{:,}'.format(x)

In [246]: def build_formatters(df, format):
     ...:     return {column:format 
     ...:               for (column, dtype) in df.dtypes.iteritems()
     ...:               if dtype in [np.dtype('int64'), np.dtype('float64')]}
     ...: 

In [247]: formatters = build_formatters(df_int, num_format)


In [249]: print df_int.to_html(formatters=formatters)
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>A</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>20,000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>10,000</td>
    </tr>
  </tbody>
</table>
like image 20
chrisb Avatar answered Oct 12 '22 01:10

chrisb