Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get Python pandas DataFrame from string written by print()?

THIS is an updated version of the question providing a handy function

pd_read_printed(str_printed_df)

designed to create a pandas DataFrame out of the string written previously using print(some_pandas_DataFrame) :

def pd_read_printed(str_printed_df):
    global pd, StringIO
    try: x = pd
    except: import pandas as pd
    try: x = StringIO
    except: from pandas.compat import StringIO
    return pd.read_csv(StringIO(str_printed_df), delim_whitespace=True)

I put it together for own use after I have got here the answers to the following question:

I see in Internet often the content of a pandas DataFrame in its printed version like for example::

df1_as_string = """
 Sp   Mt   Value  count
4  MM2  S4   bg     10
5  MM2  S4   dgd    1
6  MM4  S2   rd     2
7  MM4  S2   cb     8
8  MM4  S2   uyi    8 
"""

The question is: How to obtain a variable holding the DataFrame from a string variable in a style like:

df1 = pandas.someToMeUnknownPandasFunction(df1_as_string)

?

NOW let's use the provided function to create a DataFrame from df1_as_string:

df1 = pd_read_printed(df1_as_string)

and check if it worked as expected:

print(df1)

gives:

    Sp  Mt Value  count
4  MM2  S4    bg     10
5  MM2  S4   dgd      1
6  MM4  S2    rd      2
7  MM4  S2    cb      8
8  MM4  S2   uyi      8
like image 754
Claudio Avatar asked Apr 23 '17 13:04

Claudio


1 Answers

Two methods

option 1
pd.read_clipboard

This is my goto method for simply formatted dataframes. I copy the dataframe text and follow it up with df = pd.read_clipboard()

option 2
StringIO + pd.read_csv

For dataframes that have more complicated structure, I may need some of the options in read_csv so I might set it up this way. Keep in mind that for the dataframe you provided, I'd almost never do it this way because it's slower for me to get at a dataframe.

from io import StringIO
import pandas as pd

df1_as_string = """
 Sp   Mt   Value  count
4  MM2  S4   bg     10
5  MM2  S4   dgd    1
6  MM4  S2   rd     2
7  MM4  S2   cb     8
8  MM4  S2   uyi    8 
"""

df = pd.read_csv(StringIO(df1_as_string), delim_whitespace=True)

In either case, I end up with:

print(df)

    Sp  Mt Value  count
4  MM2  S4    bg     10
5  MM2  S4   dgd      1
6  MM4  S2    rd      2
7  MM4  S2    cb      8
8  MM4  S2   uyi      8
like image 65
piRSquared Avatar answered Oct 13 '22 22:10

piRSquared