I have a DataFrame with an index called city_id of cities in the format [city],[state] (e.g., new york,ny containing integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_id by adding their column values. I looked at groupby() but it wasn't immediately obvious how to apply it to this problem.
Edit:
An example: I'd like to change this:
city_id    val1 val2 val3 houston,tx    1    2    0 houston,tx    0    0    1 houston,tx    2    1    1   into this:
city_id    val1 val2 val3 houston,tx    3    3    2   if there are ~10-20k rows.
To merge rows within a group together in Pandas we can use the agg(~) method together with the join(~) method to concatenate the row values.
merge() function to join the two data frames by inner join. Now, add a suffix called 'remove' for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix 'remove'. This will ensure that identical columns don't exist in the new dataframe.
Starting from
>>> df               val1  val2  val3 city_id                        houston,tx       1     2     0 houston,tx       0     0     1 houston,tx       2     1     1 somewhere,ew     4     3     7   I might do
>>> df.groupby(df.index).sum()               val1  val2  val3 city_id                        houston,tx       3     3     2 somewhere,ew     4     3     7   or
>>> df.reset_index().groupby("city_id").sum()               val1  val2  val3 city_id                        houston,tx       3     3     2 somewhere,ew     4     3     7   The first approach passes the index values (in this case, the city_id values) to groupby and tells it to use those as the group keys, and the second resets the index and then selects the city_id column.  See this section of the docs for more examples.  Note that there are lots of other methods in the DataFrameGroupBy objects, too:
>>> df.groupby(df.index) <pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790> >>> df.groupby(df.index).max()               val1  val2  val3 city_id                        houston,tx       2     2     1 somewhere,ew     4     3     7 >>> df.groupby(df.index).mean()               val1  val2      val3 city_id                            houston,tx       1     1  0.666667 somewhere,ew     4     3  7.000000 
                        Something in the same line. Sorry not the exact replica.
mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78},           {'subid' : 'B14-112', 'age': 22, 'fdg':1.56},{'subid' : 'B14-112', 'age': 40, 'fdg':2.00},] df = pandas.DataFrame(mydata)  gg = df.groupby("subid",sort=True).sum() 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With