If I have an existing pandas dataframe, is there a way to generate the python code, which when executed in another python script, will reproduce that dataframe.
e.g.
In[1]: df Out[1]: income user 0 40000 Bob 1 50000 Jane 2 42000 Alice In[2]: someFunToWriteDfCode(df) Out[2]: df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice'], ...: 'income': [40000, 50000, 42000]})
data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame. For the row labels, the Index to be used for the resulting frame is Optional Default np. arange(n) if no index is passed. For column labels, the optional default syntax is - np.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
You could try to use the to_dict() method on DataFrame:
print "df = pd.DataFrame( %s )" % (str(df.to_dict()))
If your data contains NaN's, you'll have to replace them with float('nan'):
print "df = pd.DataFrame( %s )" % (str(df.to_dict()).replace(" nan"," float('nan')"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With