Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.
Python 3.6.1, Pandas 0.20.1
groupby result looks like:
id month year count week 0 9066 82 32142 895 1 7679 84 30112 749 2 8368 126 42187 872 3 11038 102 34165 976 4 8815 117 34122 767 5 10979 163 50225 1252 6 8726 142 38159 996 7 5568 63 26143 582
Want a csv that looks like
week count 0 895 1 749 2 872 3 976 4 767 5 1252 6 996 7 582
Current code:
week_grouped = df.groupby('week') week_grouped.sum() #At this point you have the groupby result week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.
Read SO solutions:
output groupby to csv file pandas
week_grouped.drop_duplicates().to_csv('week_grouped.csv')
Result: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method
Python pandas - writing groupby output to file
week_grouped.reset_index().to_csv('week_grouped.csv')
Result: AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"
Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.
By using pandas. DataFrame. to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column.
If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.
Try doing this:
week_grouped = df.groupby('week') week_grouped.sum().reset_index().to_csv('week_grouped.csv')
That'll write the entire dataframe to the file. If you only want those two columns then,
week_grouped = df.groupby('week') week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
Here's a line by line explanation of the original code:
# This creates a "groupby" object (not a dataframe object) # and you store it in the week_grouped variable. week_grouped = df.groupby('week') # This instructs pandas to sum up all the numeric type columns in each # group. This returns a dataframe where each row is the sum of the # group's numeric columns. You're not storing this dataframe in your # example. week_grouped.sum() # Here you're calling the to_csv method on a groupby object... but # that object type doesn't have that method. Dataframes have that method. # So we should store the previous line's result (a dataframe) into a variable # and then call its to_csv method. week_grouped.to_csv('week_grouped.csv') # Like this: summed_weeks = week_grouped.sum() summed_weeks.to_csv('...') # Or with less typing simply week_grouped.sum().to_csv('...')
Try changing your second line to week_grouped = week_grouped.sum()
and re-running all three lines.
If you run week_grouped.sum()
in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped
. Some pandas methods have an inplace=True
argument (e.g., df.sort_values(by=col_name, inplace=True)
), but sum
does not.
EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby
:
df = pd.read_csv('input.csv') df[['id', 'count']].to_csv('output.csv')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With