how to create a dictionary of pandas dataframes, and return the dataframes into excel worksheets?
Hi All,
I am learning pandas and python, and I want to create a dictionary which contains a some dataframes, which I can then run metrics over each dataframe. With each unique cluster name (one of the columns) I would like to create a dataframe (subset of original dataframe.
Then I would like to be able to select it, run metrics over it, putting the results in a new dataframe, and then place the original dataframe (each subset) into a separate worksheet using xlsxwriter python library.
#create dictionary object
c_dict = {}
#get a list of the unique names
c_dict= data.groupby('Cluster').groups
#create a dictionary of dataframes, one for each cluster
for cluster in c_dict.items():
df = data[data['Cluster']==cluster
c_dict[cluster] =df <<< im getting invalid syntax here
#go through the dictionary and create a worksheet and put the dataframe in it.
for k,v in c_dict.items():
dataframe = GetDF(k) <<< creating worksheets and puts the data from the dataframe > worksheet is not working because of invalid syntax when trying to create dataframe dictionary ^^
dataframe.to_excel(writer,sheet_name=k)
writer.save
#get the dataframe from the dictionary,
GetDF(dictionary_key)
return c_dict[dictionary_key]
You can convert dictionary to pandas dataframe by creating a list of Dictionary items using the list(my_dict. items()) . Also, you can pass the column header values using the columns paramter.
To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to. Multiple sheets may be written to by specifying unique sheet_name . With all data written to the file it is necessary to save the changes.
Use pandas to_excel() function to write a DataFrame to an excel sheet with extension . xlsx. By default it writes a single DataFrame to an excel file, you can also write multiple sheets by using an ExcelWriter object with a target file name, and sheet name to write to.
I think this is what you're looking for. As I said in the comments, it's probably not the right solution and it's definitely not idomatic for pandas DataFrames.
import pandas as pd
groups = data.groupby('Cluster')
#create a dictionary of dataframes, one for each cluster
c_dict = {k: pd.DataFrame(v) for k, v in groups.groups.iteritems() }
If you want to save this to an excel file, the documentation is here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html
There is a nice example at the bottom that will do what you need. Hint: use for k,v in myDict.iteritems()
to get keys and values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With