Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read in pretty-printed dataframe into a Pandas dataframe?

# necessary imports
from tabulate import tabulate
import pandas as pd

I have a dataframe:

df = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3']},
                   index=[0, 1, 2, 3])

Using this, I pretty print it:

prettyprint=tabulate(df, headers='keys', tablefmt='psql')
print(prettyprint)

Result:

+----+-----+-----+-----+-----+
|    | A   | B   | C   | D   |
|----+-----+-----+-----+-----|
|  0 | A0  | B0  | C0  | D0  |
|  1 | A1  | B1  | C1  | D1  |
|  2 | A2  | B2  | C2  | D2  |
|  3 | A3  | B3  | C3  | D3  |
+----+-----+-----+-----+-----+

Saving it to a text file:

with open("PrettyPrintOutput.txt","w") as text_file:
    text_file.wite(prettyprint)

How can I read PrettyPrintOutput.txt back into a dataframe without doing a lot of text processing manually?

like image 665
zabop Avatar asked Sep 12 '25 14:09

zabop


1 Answers

One solution is to use clever keyword arguments in pd.read_csv / pd.read_clipboard:

    df = pd.read_csv(r'PrettyPrintOutput.txt', sep='|', comment='+', skiprows=[2], index_col=1)
    df = df[[col for col in df.columns if 'Unnamed' not in col]]

I just define all lines beginning with '+' as comments, so they don't get imported. This does not help against the third row, which has to be excluded using skiprow.

The second line is needed because you end up with additional columns using the '|' as separator. If you know the column names in advance use the keyword usecols to be explicit.

Output:

       A      B      C      D   
                                
0      A0     B0     C0     D0  
1      A1     B1     C1     D1  
2      A2     B2     C2     D2  
3      A3     B3     C3     D3 

It also works with pd.read_clipboard, because the functions accept the same keyword arguments.

like image 85
above_c_level Avatar answered Sep 14 '25 05:09

above_c_level