Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas List All Unique Values Based On Groupby

I have a dataframe that has worksite info.

District#    Site#           Address
        1        1    123 Bayview Ln
        1        2    456 Example St
        2       36      789 Hello Dr
        2       44      789 Hello Dr

I am trying to transform this dataframe to add a column with the highest Site# as well as the distinct addresses when I group by District#. Here is an example of what I want the output to look like:

District#    Site#           Address    MaxSite#            All District Addresses
        1        1    123 Bayview Ln           2     123 Bayview Ln,456 Example St
        1        2    456 Example St           2     123 Bayview Ln,456 Example St
        2       36      789 Hello Dr          44                      789 Hello Dr
        2       44      789 Hello Dr          44                      789 Hello Dr

I am able to get the Max Site# by doing

df['MaxSite#'] = df.groupby(by='District#')['Site#'].transform('max')

But I am trying to find a similar way to list all of the unique addresses when I groupby District#.

I have tried doing .transform('unique') but that is not a valid function name and doing .agg(['unique']) returns dimensions that do not match

like image 910
Bijan Avatar asked Sep 01 '25 15:09

Bijan


2 Answers

You can use groupby and agg to get the Max Site Number and List all the addresses

Then merge back to the original dataframe:

grouped_df = df.groupby('District#').agg(Max_Site_Num=('Site#', 'max'), 
      All_District_Addresses=('Address', lambda x: list(set(x))).reset_index()

df = df.merge(grouped_df,on='District#')

Output:

     District#  Site#         Address  Max_Site_Num            All_District_Addresses
0          1      1  123 Bayview Ln             2  [123 Bayview Ln, 456 Example St]
1          1      2  456 Example St             2  [123 Bayview Ln, 456 Example St]
2          2     36    789 Hello Dr            44                    [789 Hello Dr]
3          2     44    789 Hello Dr            44                    [789 Hello Dr]
like image 80
iBeMeltin Avatar answered Sep 05 '25 10:09

iBeMeltin


There are two steps to get the unqiue addresses.

  1. We obtain the unique addresses corresponding to each District# and make them into a single string
  2. We then create the All District Addresses Series by mapping District# to the unique_addresses DataFrame.
data = {
 "District#": [1, 1, 2, 2],
 "Site#": [1, 2, 36, 44],
 "Address": ["123 Bayview Ln", "456 Example St", "789 Hello Dr", "789 Hello Dr"]
}
df = pd.DataFrame(data)
# Series you already calculated
df['MaxSite#'] = df.groupby(by='District#')['Site#'].transform('max') 

# Adding per-district unique addresses 
unique_addresses = df.groupby(by="District#")["Address"].unique().apply(lambda x: ",".join(x))  # to make values a single string
df["All District Addresses"] = df["District#"].map(unique_addresses)

Output:

   District#  Site#         Address  MaxSite#         All District Addresses
0          1      1  123 Bayview Ln         2  123 Bayview Ln,456 Example St
1          1      2  456 Example St         2  123 Bayview Ln,456 Example St
2          2     36    789 Hello Dr        44                   789 Hello Dr
3          2     44    789 Hello Dr        44                   789 Hello Dr

A one-liner if you prefer:

df["All District Addresses"] = df["District#"].map(df.groupby(by="District#")["Address"].unique().apply(lambda x: ",".join(x)))
like image 28
Capybara Avatar answered Sep 05 '25 10:09

Capybara