I have a DataFrame with an index called city_id
of cities in the format [city],[state]
(e.g., new york,ny
containing integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_id
by adding their column values. I looked at groupby()
but it wasn't immediately obvious how to apply it to this problem.
Edit:
An example: I'd like to change this:
city_id val1 val2 val3 houston,tx 1 2 0 houston,tx 0 0 1 houston,tx 2 1 1
into this:
city_id val1 val2 val3 houston,tx 3 3 2
if there are ~10-20k rows.
To merge rows within a group together in Pandas we can use the agg(~) method together with the join(~) method to concatenate the row values.
merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.
Starting from
>>> df val1 val2 val3 city_id houston,tx 1 2 0 houston,tx 0 0 1 houston,tx 2 1 1 somewhere,ew 4 3 7
I might do
>>> df.groupby(df.index).sum() val1 val2 val3 city_id houston,tx 3 3 2 somewhere,ew 4 3 7
or
>>> df.reset_index().groupby("city_id").sum() val1 val2 val3 city_id houston,tx 3 3 2 somewhere,ew 4 3 7
The first approach passes the index values (in this case, the city_id
values) to groupby
and tells it to use those as the group keys, and the second resets the index and then selects the city_id
column. See this section of the docs for more examples. Note that there are lots of other methods in the DataFrameGroupBy
objects, too:
>>> df.groupby(df.index) <pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790> >>> df.groupby(df.index).max() val1 val2 val3 city_id houston,tx 2 2 1 somewhere,ew 4 3 7 >>> df.groupby(df.index).mean() val1 val2 val3 city_id houston,tx 1 1 0.666667 somewhere,ew 4 3 7.000000
Something in the same line. Sorry not the exact replica.
mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78}, {'subid' : 'B14-112', 'age': 22, 'fdg':1.56},{'subid' : 'B14-112', 'age': 40, 'fdg':2.00},] df = pandas.DataFrame(mydata) gg = df.groupby("subid",sort=True).sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With