Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby to to_csv

Tags:

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.

Python 3.6.1, Pandas 0.20.1

groupby result looks like:

id  month   year    count week                 0   9066    82  32142   895 1   7679    84  30112   749 2   8368    126 42187   872 3   11038   102 34165   976 4   8815    117 34122   767 5   10979   163 50225   1252 6   8726    142 38159   996 7   5568    63  26143   582 

Want a csv that looks like

week  count 0   895 1   749 2   872 3   976 4   767 5   1252 6   996 7   582 

Current code:

week_grouped = df.groupby('week') week_grouped.sum() #At this point you have the groupby result week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.  

Read SO solutions:

output groupby to csv file pandas

week_grouped.drop_duplicates().to_csv('week_grouped.csv') 

Result: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method

Python pandas - writing groupby output to file

week_grouped.reset_index().to_csv('week_grouped.csv') 

Result: AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"

like image 432
kalmdown Avatar asked Dec 01 '17 22:12

kalmdown


People also ask

What does To_csv do in pandas?

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.

How do I convert a pandas DataFrame to a csv file?

By using pandas. DataFrame. to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column.

Does To_csv overwrite?

If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.


2 Answers

Try doing this:

week_grouped = df.groupby('week') week_grouped.sum().reset_index().to_csv('week_grouped.csv') 

That'll write the entire dataframe to the file. If you only want those two columns then,

week_grouped = df.groupby('week') week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv') 

Here's a line by line explanation of the original code:

# This creates a "groupby" object (not a dataframe object)  # and you store it in the week_grouped variable. week_grouped = df.groupby('week')  # This instructs pandas to sum up all the numeric type columns in each  # group. This returns a dataframe where each row is the sum of the  # group's numeric columns. You're not storing this dataframe in your  # example. week_grouped.sum()   # Here you're calling the to_csv method on a groupby object... but # that object type doesn't have that method. Dataframes have that method.  # So we should store the previous line's result (a dataframe) into a variable  # and then call its to_csv method. week_grouped.to_csv('week_grouped.csv')  # Like this: summed_weeks = week_grouped.sum() summed_weeks.to_csv('...')  # Or with less typing simply week_grouped.sum().to_csv('...') 
like image 168
Alex Luis Arias Avatar answered Oct 06 '22 01:10

Alex Luis Arias


Try changing your second line to week_grouped = week_grouped.sum() and re-running all three lines.

If you run week_grouped.sum() in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped. Some pandas methods have an inplace=True argument (e.g., df.sort_values(by=col_name, inplace=True)), but sum does not.

EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby:

df = pd.read_csv('input.csv') df[['id', 'count']].to_csv('output.csv') 
like image 41
Peter Leimbigler Avatar answered Oct 05 '22 23:10

Peter Leimbigler