Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

specify number of spaces between pandas DataFrame columns when printing

When you print a pandas DataFrame, which calls DataFrame.to_string, it normally inserts a minimum of 2 spaces between the columns. For example, this code

import pandas as pd

df = pd.DataFrame( {
    "c1" : ("a", "bb", "ccc", "dddd", "eeeeee"),
    "c2" : (11, 22, 33, 44, 55),
    "a3235235235": [1, 2, 3, 4, 5]
} )
print(df)

outputs

       c1  c2  a3235235235
0       a  11            1
1      bb  22            2
2     ccc  33            3
3    dddd  44            4
4  eeeeee  55            5

which has a minimum of 2 spaces between each column.

I am copying DataFarames printed on the console and pasting it into documents, and I have received feedback that it is hard to read: people would like more spaces between the columns.

Is there a standard way to do that?

I see no option in either DataFrame.to_string or pandas.set_option.

I have done a web search, and not found an answer. This question asks how to remove those 2 spaces, while this question asks why sometimes only 1 space is between columns instead of 2 (I also have seen this bug, hope someone answers that question).

My hack solution is to define a function that converts a DataFrame's columns to type str, and then prepends each element with a string of the specified number of spaces.

This code (added to the code above)

def prependSpacesToColumns(df: pd.DataFrame, n: int = 3):
    spaces = ' ' * n
    
    # ensure every column name has the leading spaces:
    if isinstance(df.columns, pd.MultiIndex):
        for i in range(df.columns.nlevels):
            levelNew = [spaces + str(s) for s in df.columns.levels[i]]
            df.columns.set_levels(levelNew, level = i, inplace = True)
    else:
        df.columns = spaces + df.columns
    
    # ensure every element has the leading spaces:
    df = df.astype(str)
    df = spaces + df
    
    return df

dfSp = prependSpacesToColumns(df, 3)
print(dfSp)

outputs

          c1     c2    a3235235235
0          a     11              1
1         bb     22              2
2        ccc     33              3
3       dddd     44              4
4     eeeeee     55              5

which is the desired effect.

But I think that pandas surely must have some builtin simple standard way to do this. Did I miss how?

Also, the solution needs to handle a DataFrame whose columns are a MultiIndex. To continue the code example, consider this modification:

idx = (("Outer", "Inner1"), ("Outer", "Inner2"), ("Outer", "a3235235235"))
df.columns = pd.MultiIndex.from_tuples(idx)
like image 906
HaroldFinch Avatar asked Feb 25 '21 19:02

HaroldFinch


People also ask

How do I limit decimal places in pandas?

Lets use the dataframe. round() function to round off all the decimal values in the dataframe to 3 decimal places. Output : Example #2: Use round() function to round off all the columns in dataframe to different places.

How do I slice a range of columns in pandas?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.

How do I add a space between columns in pandas?

Add leading space in pandas using rjust() function : rjust() function is used to add space or padding to the left side of the specific column in pandas. Add trailing space in pandas using ljust() function : ljust() function is used to add space or padding to the right side of the specific column in pandas.


1 Answers

You can accomplish this through formatters; it takes a bit of code to create the dictionary {'col_name': format_string}. Find the max character length in each column or the length of the column header, whichever is greater, add some padding, and then pass a formatting string.

Use partial from functools as the formatters expect a one parameter function, yet we need to specify a different width for each column.

Sample Data

import pandas as pd
df = pd.DataFrame({"c1": ("a", "bb", "ccc", "dddd", 'eeeeee'),
                   "c2": (1, 22, 33, 44, 55),
                   "a3235235235": [1,2,3,4,5]})

Code

from functools import partial

# Formatting string 
def get_fmt_str(x, fill):
    return '{message: >{fill}}'.format(message=x, fill=fill)

# Max character length per column
s = df.astype(str).agg(lambda x: x.str.len()).max() 

pad = 6  # How many spaces between 
fmts = {}
for idx, c_len in s.iteritems():
    # Deal with MultIndex tuples or simple string labels. 
    if isinstance(idx, tuple):
        lab_len = max([len(str(x)) for x in idx])
    else:
        lab_len = len(str(idx))

    fill = max(lab_len, c_len) + pad - 1
    fmts[idx] = partial(get_fmt_str, fill=fill)

print(df.to_string(formatters=fmts))

            c1      c2      a3235235235
0            a      11                1
1           bb      22                2
2          ccc      33                3
3         dddd      44                4
4       eeeeee      55                5

# MultiIndex Output
         Outer                             
        Inner1      Inner2      a3235235235
0            a          11                1
1           bb          22                2
2          ccc          33                3
3         dddd          44                4
4       eeeeee          55                5
like image 161
ALollz Avatar answered Oct 16 '22 18:10

ALollz