Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: How to get .to_string() method to align column headers with column values?

Tags:

python

pandas

This has been stumping me for a while and I feel like there has to be a solution since printing a dataframe always aligns the columns headers with their respective values.

example:

df = pd.DataFrame({'First column name': [1234, 2345, 3456], 'Second column name': [5432,4321,6543], 'Third column name': [1236,3457,3568]})
df_string = df.to_string(justify='left', col_space='30')


now when you print df_string, you get the desired formatting:http://i.imgur.com/Xyoy4Op.png

but when I take the string and view it (in this case, I'm passing the string to a PyQt widget that displays text), this is the output: http://i.imgur.com/a1NcBQA.png

(this is how the string appears on my console): http://i.imgur.com/WRHEhKB.png



Any help is greatly appreciated.

like image 430
yobogoya Avatar asked Jun 17 '16 23:06

yobogoya


1 Answers

This lines up column headers nicely:

print(df.to_string())

But this prints indices too. If you don't want to print the indices, you can:

print(df.to_string(index=False)

Problem is, the column headers no longer line up correctly.

So I wrote this hack:

blanks = r'^ *([a-zA-Z_0-9-]*) .*$'
blanks_comp = re.compile(blanks)

def find_index_in_line(line):
    index = 0
    spaces = False
    for ch in line:
        if ch == ' ':
            spaces = True
        elif spaces:
            break
        index += 1
    return index

def pretty_to_string(df):
    lines = df.to_string().split('\n')
    header = lines[0]
    m = blanks_comp.match(header)
    indices = []
    if m:
        st_index = m.start(1)
        indices.append(st_index)

    non_header_lines = lines[1:len(lines)]

    for line in non_header_lines:
        index = find_index_in_line(line)
        indices.append(index)

    mn = np.min(indices)
    newlines = []
    for l in lines:
        newlines.append(l[mn:len(l)])

    return '\n'.join(newlines)

Which you invoke like this:

print(pretty_to_string(df))

The code works by calling df.to_string() (where columns are lined up nicely) and calculates the max # of characters taken up by the index column.

It then strips off the indices from each line.

like image 122
E L Avatar answered Sep 23 '22 12:09

E L